Efficient estimation of partially linear additive Cox models and variance estimation under shape restrictions

Junjun Lang KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China Yukun Liu Corresponding author: [email protected] KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China Jing Qin National Institute of Allergy and Infectious Diseases, National Institutes of Health, USA

Abstract

Shape-restricted inferences have exhibited empirical success in various applications with survival data. However, certain works fall short in providing a rigorous theoretical justification and an easy-to-use variance estimator with theoretical guarantee. Motivated by Deng et al. (2023), this paper delves into an additive and shape-restricted partially linear Cox model for right-censored data, where each additive component satisfies a specific shape restriction, encompassing monotonic increasing/decreasing and convexity/concavity. We systematically investigate the consistencies and convergence rates of the shape-restricted maximum partial likelihood estimator (SMPLE) of all the underlying parameters. We further establish the aymptotic normality and semiparametric effiency of the SMPLE for the linear covariate shift. To estimate the asymptotic variance, we propose an innovative data-splitting variance estimation method that boasts exceptional versatility and broad applicability. Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)’s method when the Cox model is violated or the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates based on the SMPLE and its competitors.

Keywords: Shape restriction; Righter-censored data; Additive model; Semiparametric efficiency; Variance estimation

1 Introduction

Shape restrictions (such as monotonicity and convexity) arise naturally in numerous practical scenarios. For instance, the growth curves of animals and plants in ecology and the dose-response in medicine must inherently exhibit non-decreasing characteristics (Chang et al., 2007; Wang and Ghosh, 2012). In the realm of economics, utility and production functions are often concave in income and prices (Matzkin, 1991; Varian, 1984), cost functions are monotone increasing, concave in input prices, and may exhibit non-increasing or non-decreasing returns to scale (Horowitz and Lee, 2017). In genetic epidemiology studies, the cumulative risk of a disease for individuals possess monotonicity (Qin et al., 2014). While in reliability analysis, the bathtub curve describing the failure rate typically displays convexity.

Incorporating shape restrictions into statistical analysis, apart from its exceptional interpretability and ability to enforce domain-specific constraints, often results in an estimation procedure that is devoid of tuning parameters, enhancing its efficiency and robustness. Therefore shape-restricted techniques has become an increasing popular tool for statistical inference or learning in various settings over the past decades. A comprehensive review on shape-restricted nonparametric inferences can be found in Groeneboom and Jongbloed (2014) and references therein. Recently, Chen and Samworth (2016) developed an algorithm for the estimation of the generalized additive model in which each of the additive components is linear or subject to a shape restriction. Balabdaoui et al. (2019) considered the estimation of the index parameter in a single-index model with a monotonically increasing link function. Deng and Zhang (2020) studied minimax and adaptation rates in general multiple isotonic regression. Feng et al. (2022) systematically investigate the theoretical properties of the least squared estimator of a S-shaped regression function.

This paper focus on the statistical inference for righter-censored survival data. Let $T$ denote the survival time and $(Z,X)\in\mathbb{R}^{p}\times\mathbb{R}^{d}$ denote a $(p+d)\times 1$ vector of covariates. We consider the partially linear Cox model of (Sasieni, 1992, PLCM) for modelling the conditional hazard function, i.e.

\displaystyle\lambda_{T}(t\mid x,z)=\lambda(t)\exp(\beta^{\top}x+g(z)),

(1)

where $\lambda(\cdot)$ is the unspecified baseline hazard function, $\beta\in\mathbb{R}^{d}$ is unspecified and $g(\cdot):\mathbb{R}^{p}\mapsto\mathbb{R}$ is an unknown function. This model reduces to the renowned Cox proportional hazards model (Cox, 1972, 1975) when the covariate $Z$ disappears, and it becomes the nonparametric Cox model (Sleeper and Harrington, 1990; O’Sullivan, 1993) in the absence of $X$ .

Many nonparametric techniques have been developed for the estimation of the PLCM, in particular for the linear covariate effect. Examples include profile partial likelihood together with a kernel technique (Heller, 2001), maximum likelihood estimation with a deep neural network (Zhong et al., 2022), and a kernel machine representation method (Rong et al., 2024), etc. However, these methods are either hampered by the curse of dimensionality or lack interpretability for $g(Z)$ , or suffer from tuning parameters, whose selection is not always straightforward. Alternatively, Huang (1999) proposed to model $g(Z)$ by a generalized additive model (Hastie and Tibshirani, 1986, 1990), which effectively avoids the curse of dimensionality and enforces an additive effect for the covariate $Z$ . Specifically,

\displaystyle g(Z)=\sum\limits_{j=1}^{p}g_{j}(Z_{(j)}),

(2)

where for $1\leq j\leq p$ , $Z_{(j)}$ is the $j$ -th component of $Z$ and $g_{j}$ is an unknown function. Huang (1999) proposed the use of polynomial splines to fit the unknown additive components. This method entails a number of tuning parameters, also yields convergence rates that lack conciseness and elegance. Furthermore, the spline method does not provide good interpretability for the additive covariate $Z_{(j)}$ .

Our paper is motivated by the work of Deng et al. (2023) which studied a shape-restricted and additive PLCM. Specifically, under models (1) and 2, they assume that each $g_{j}$ is monotonic increasing/decreasing or convex/concave. An active-set optimization algorithm was provided to calculate the shape-restricted maximum likelihood estimator. The shape-restriction strategy facilitates the utilization of prior knowledge regarding the effect of the log conditional hazard function on each covariate $Z_{(j)}$ and leads to a tuning-parameter-free estimation procedure. However, they proved only a consistency result, and did not provide any asymptotic normality results. Qin et al. (2021) studied a PLCM with a single additive component subject to shape restrictions, but they did not establish any $\sqrt{n}$ -consistency result. In addition, in general shape-restriction inferences, even if asymptotic normality results can be established, it is generally challenging to construct reasonable estimators for the asymptotic variances with theoretical guarantees (Groeneboom and Hendrickx, 2017).

This paper makes two main contributions to the literature of additive and shape-restricted PLCMs for survival data. The first contribution is to provide powerful statistical guarantees for the shape-restricted maximum partial likelihood estimator (SMPLE) and the induced Breslow-type estimator for the baseline cumulative hazard function under the model assumption of Deng et al. (2023). This includes a thorough convergence rate analysis for the estimators of the infinitely dimensional parameters, as well as establishing asymptotic normality and semiparametric efficiency for the estimator of the linear covariate effect. Our second contribution is to offer an easy-to-use estimator for the asymptotic variance of the linear covariate effect estimator. We show that this variance estimation method always provide consistent estimators once the corresponding asymptotic normality result holds. This method is very flexible and is applicable for general purpose especially in shape-restricted inferences, where theoretical guarantee of a bootstrap variance estimator is generally rather challenging (Groeneboom and Hendrickx, 2017). Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)’s method when the Cox model is violated and the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates for the SMPLE and its competitors.

The rest of this paper is organized as follows. Section 2 introduces notations, data, and the shape-restricted maximum partial likelihood estimators (SMPLE). Section 3 investigates the convergence rates of the SMPLEs for all the unknown parameters, including $\beta$ , the unknown additive components, and the baseline cumulative hazard function. Section 4 establishes the asymptotic normality and semiparametric efficiency of the SMPLE for $\beta$ . A novel estimation method is also provided to estimate the asymptotic variance of the SMPLE of $\beta$ . A simulation study and real data analysis are presented in Section 5 and 6, respectively. Section 7 contains concluding remarks. For clarity, all technical proofs are postponed to the supplementary material.

2 Methodology

2.1 Data and model assumptions

Let $T$ and $(X,Z)$ be the survival time and the vector of covariates, respectively, in the introduction. Suppose that given $(X,Z)$ , the conditional hazard function of $T$ satisfies model (1) with $g(Z)$ satisfying (2). The survival time $T$ may be right censored by a censoring time $C$ and we only observe $Y=\min(T,C)$ . Throughout this paper, we use $\mathbf{1}(A)$ to denote the indicator function of the set $A$ and use a subscript 0 to highlight the true counterpart of a parameter. Let $\Delta=\mathbf{1}(T\leq C)$ be the non-censoring indicator. Given $n$ independent and identically distributed (iid) observations $(X_{i},Z_{i},Y_{i},\Delta_{i})$ , $1\leq i\leq n$ , from $(X,Z,Y,\Delta)$ , we wish to infer $(\beta_{0},g_{0})$ and the baseline cumulative hazard function $\Lambda_{0}(y)=\int_{0}^{y}\lambda_{0}(t)dt$ .

The identifiability issue of models (1) and (2) was investigated by Deng et al. (2023), following which we assume ${\mathbb{E}}\{g_{0,j}(Z_{(j)})\Delta\}=0$ , $j=1,2,\cdots,p$ , for identifiability. Furthermore, we assume that for $1\leq j\leq p$ , $g_{0,j}(\cdot)$ satisfies one of the four shape restrictions: monotone increasing, monotone decreasing, convex and concave, which are encoded as shape types 1, 2, 3 and 4, respectively. For any additive function $g=\sum\limits_{j=1}^{p}g_{j}(z_{(j)})$ , we define ${\rm sha}(g)=({\rm sha}(g_{1}),\cdots,{\rm sha}(g_{p}))^{\top}$ , where ${\rm sha}(h)\in\{1,2,3,4\}$ denotes the shape type of a univariate function $h$ . We always denote $\boldsymbol{k}_{0}={\rm sha}(g_{0})$ . Let $\mathcal{X}$ be the support of $X$ and for simplicity, we assume that the support of $Z_{(j)}$ is $[0,1]$ for $j=1,2,\cdots,p$ .

2.2 SMPLE

For any $(\beta,g)$ , denote $\eta=(\beta,g)$ and $R_{\eta}(U)=X^{\top}\beta+g(Z)$ , where $U=(X^{\top},Z^{\top})^{\top}$ . $1/n$ times the usual partial log likelihood is

\displaystyle L_{n}(\eta)=\frac{1}{n}\sum\limits_{i=1}^{n}\Delta_{i}\left[R_{\eta}(U_{i})-\log\left(\sum\limits_{j=1}^{n}\mathbf{1}(Y_{j}\geq Y_{i})\exp(R_{\eta}(U_{j}))\right)\right].

We propose to estimate $\eta$ by the shape-restricted maximum partial likelihood estimator (SMPLE),

\displaystyle\hat{\eta}:=(\hat{\beta},\hat{g})=\operatorname{argmax}_{\eta\in\mathbb{R}^{d}\times\mathcal{G}_{\boldsymbol{k}_{0}}}L_{n}(\eta),

(3)

where

\displaystyle\mathcal{G}_{\boldsymbol{k}_{0}}=\bigg{\{}g:[0,1]^{p}\mapsto\mathbb{R}\mid g(Z)=\sum\limits_{j=1}^{p}g_{j}(z_{(j)}),~{}{\rm sha}(g)=\boldsymbol{k}_{0},\mathbb{E}\left[\Delta g_{j}(Z_{(j)})\right]=0,~{}1\leq j\leq p\bigg{\}}

is the parameter space of $g$ . With the SMPLE in (3), we estimate $\Lambda_{0}(y)$ by the Breslow-type estimator

\displaystyle\hat{\Lambda}(y;\hat{\eta})=\frac{1}{n}\sum\limits_{j=1}^{n}\frac{\Delta_{j}}{S_{0,n}(Y_{j},\hat{\eta})}\mathbf{1}(y\geq Y_{j}),

(4)

where $S_{0,n}(y,\eta)=(1/n)\sum\limits_{i=1}^{n}\{\mathbf{1}(Y_{i}\geq y)\exp(R_{\eta}(U_{i}))\}.$

The SMPLE defined in (3) can be calculated with the active-set algorithm introduced in Deng et al. (2023). Let $\hat{g}_{j}$ be functions satisfying $\hat{g}(Z)=\sum\limits_{j=1}^{p}\hat{g}_{j}(Z_{(j)})$ for all $Z$ . The function $\hat{g}(Z)$ is unique only at the observed $Z_{i}$ and is therefore non-unique typically for $Z$ other than $Z_{i}$ ’s, which is akin to general shape-restricted regression estimators (Chen and Samworth, 2016). This implies that $\hat{g}_{j}(\cdot)$ is usually non-unique for $1\leq j\leq p$ , and the solution set of $\hat{g}_{j}(\cdot)$ always contains a piece-wise linear function (Deng et al., 2023). See Figure 3 for an illustration.

3 Rate of convergence

The consistency property of the SMPLEs $\hat{\beta}$ , $\hat{g}$ , and $\hat{g}_{j}$ was established by Deng et al. (2023). In this section, we establish their convergence rates. We make the following assumptions.

Assumption 1.

(i) The observed $\{Y_{i}\}_{i=1}^{n}$ are in the interval $[0,\tau]$ , for some $\tau>0$ . (ii) Given $U$ , $T$ and $C$ are mutually independent of each other. (iii) $\Lambda_{0}(\tau)<\infty$ and ${\rm pr}(C\geq\tau\mid U)\geq c>0$ almost surely for some constant $c$ . (iv) $\mathbb{E}[\Delta X]=0$ and $\mathbb{E}[\Delta]>0$ .

Let $\|\cdot\|$ denote the usual Euclidean norm and $\|f(\cdot)\|_{\infty}$ the supreme norm of a real-valued function $f$ . For any constant $M>0$ , define

\displaystyle\mathcal{K}_{M,\boldsymbol{k}_{0}}:=\left\{\eta\mid\eta=(\beta,g),g\in\mathcal{G}_{\boldsymbol{k}_{0}},\|\beta\|+\sum\limits_{j=1}^{p}\|g_{j}(\cdot)\|_{\infty}\leq M\right\}.

Assumption 2.

The support $\mathcal{X}$ of $X$ is a bounded subset of $\mathbb{R}^{d}$ and there exists a positive constant $M_{0}>0$ such that $\eta_{0}\in\mathcal{K}_{M_{0},\boldsymbol{k}_{0}}$ .

Assumption 3.

There exists a small positive constant $\epsilon$ such that ${\rm pr}(\Delta=1\mid U)>\epsilon$ almost surely with respect to the probability measure of $U$ .

Assumption 4.

The joint density of $(Y,Z,\Delta)$ satisfies

\displaystyle 0<\inf_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=1)\leq\sup_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=1)<\infty.

Assumption 5.

When ${{\rm sha}}(g_{0,j})\in\{3,4\}$ , the density function $Z_{(j)}$ with respect to the Lebesgue measure has uniformly upper and lower bounds on $[0,1]$ .

Assumptions 1–2 are standard in the theoretical analysis of traditional Cox model and its variants (Huang, 1999; Zhong et al., 2022). Assumption 3 ensures that the probability of being uncensored is positive regardless of the covariate values, and it is used to establish the convergence rate results in Theorem 1. Assumption 4 is used in the calculation of the semiparametric efficiency lower bound (Huang, 1999). In Assumption 5, the upper bound requirement is used to calculate some entropy results needed in the proof of Proposition 1, and the lower bound requirement guarantees that the approximation errors of piecewise linear approximations of the convex/concave additive components to themselves are small enough in the proof of Theorem 3.

The Proposition below establishes the consistency of $R_{\hat{\eta}}(\cdot)$ , as an estimator of $R_{\eta_{0}}(\cdot)$ , which roughly implies the consistency of $\hat{\eta}=(\hat{\beta},\hat{g})$ . We would have proved the consistencies of $\hat{\beta}$ and each $\hat{g}_{j}$ separately. However, the latter results are not needed in the proofs of the subsequent convergence rate results given the consistency of $R_{\hat{\eta}}(\cdot)$ .

Proposition 1.

Suppose that models (1) and (2) and Assumptions 1, 2 and 5 are satisfied. As $n\rightarrow\infty$ , we have

\displaystyle\|R_{\hat{\eta}}(\cdot)-R_{\eta_{0}}(\cdot)\|_{\infty}=o_{p}(1).

(5)

Define $d^{2}(\eta,\eta_{0})=\mathbb{E}_{U}\left\{(R_{\eta}(U)-R_{\eta_{0}}(U))^{2}\right\},$ where $\mathbb{E}_{U}$ denotes the expectation with respect to $U$ . Let $\|\cdot\|_{L_{2}}$ denote the $L_{2}(P)$ norm and $\rho(\boldsymbol{k}_{0})=0.5+0.5\cdot\mathbf{1}(\cup_{i=1}^{p}({\rm sha}(g_{0,i})\in\{1,2\}))$ . One of our main results is to establish the convergence rate of the SMPLE $\hat{\eta}$ .

Theorem 1.

Assume the same conditions in Proposition 1. As $n\rightarrow\infty$ , we have

d(\hat{\eta},\eta_{0})=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).

Furthermore, if Assumptions 3–4 are satisfied and $I(\beta_{0})$ (defined in (7)) is non-singular, then for $1\leq j\leq p$ ,

\displaystyle\|\hat{\beta}-\beta_{0}\|=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right),\quad\|\hat{g}_{j}(Z_{(j)})-g_{0,j}(Z_{(j)})\|_{L_{2}}=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).

According to Theorem 1, the rates of convergence of all $\hat{g}_{j}$ are $O_{p}(n^{-2/5})$ if none of the additive components of $g$ is monotonic. Conversely, if one additive component of $g$ is monotonic, then their convergence rates all slow down to $O_{p}(n^{-1/3})$ . An explanation for this finding is that the complexity of the class of bounded and monotonic functions is much larger than that of the class of bounded and convex (or concave) functions. These convergence rate results are free from the covariate dimensionality and exhibit a much more elegant form than those in Huang (1999) and Zhong et al. (2022). Theorem 1 also establishes the convergence rate of $\hat{\beta}$ , although it is sub-optimal.

With Theorem 1, we are able to establish the uniformly rate of convergence for the SMPLE $\hat{\Lambda}(y;\hat{\eta})$ in (4) of the baseline cumulative hazard function $\Lambda_{0}(y)$ . It turns out that $\hat{\Lambda}(y;\hat{\eta})$ has the same convergence rate as $\hat{\eta}$ , although their convergence rates are quantified by different distances.

Theorem 2.

Assume the same conditions as in Proposition 1. As $n\rightarrow\infty$ , it holds that

\displaystyle\sup_{y\in[0,\tau]}\left|\hat{\Lambda}(y;\hat{\eta})-\Lambda_{0}(y)\right|=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).

Remark 1.

In practice, one may impose a combination of monotonicity and convexity/concavity constraints on the additive components according to domain knowledge. See (Chen and Samworth, 2016; Kuchibhotla et al., 2023; Deng et al., 2023) for further motivation on additional shape constraints. Proposition 1 and Theorems 1–4 still hold when model (2) incorporates additive components that satisfy both monotonicity and convexity/concavity restrictions. An intuitive explanation for this result is that the parameter space $\mathcal{G}_{\boldsymbol{k}_{0}}$ is reduced by additional constraints on the additive components and this can lead to better convergence rates of the SMPLE (if not the same).

4 Asymptotic normality and efficiency

Based on the convergence rate results in the previous section, in this section, we further show that our SMPLE in (3) for the linear covariate effect $\hat{\beta}$ is asymptotically normal and semiparametric efficient, in the sense that its asymptotic variance achieves the semiparametric efficiency lower bound (Bickel et al., 1993) or the information bound of estimating $\beta_{0}$ under models (1) and (2).

We begin with presenting the information bound for $\beta_{0}$ . Recall that $U=(X^{\top},Z^{\top})^{\top}$ , and define

\displaystyle M(y)\equiv M(y\mid Y,\Delta,U)=\Delta\mathbf{1}(Y\leq y)-\int_{0}^{y}\mathbf{1}(Y\geq t)\exp\{R_{\eta_{0}}(U)\}d\Lambda_{0}(t),

which is a counting process martingale associated with the Cox model. The log-likelihood of model (1) based on one observation $(X,Z,Y,\Delta)$ is (up to constant)

\displaystyle\ell(\beta,g,\Lambda)=\Delta\log\lambda(Y)+\Delta\{X^{\top}\beta+g(Z)\}-\Lambda(Y)\exp\{X^{\top}\beta+g(Z)\}.

(6)

Conisder a parametric smooth sub-model $\{\lambda_{(\nu)}:\nu\in\mathbb{R}\}$ and $\{g_{j,(\nu)}:\nu\in\mathbb{R}\}$ , $1\leq j\leq p$ , with $\lambda_{(0)}=\lambda_{0}$ and $g_{j,(0)}=g_{0,j}$ . Define $L_{2}(P_{Y})$ to be the set of $a(\cdot)$ satisfying ${\mathbb{E}}\{\Delta a^{2}(Y)\}<\infty$ and $a(y)=\partial\log\lambda_{(\nu)}(y)/\partial\nu|_{\nu=0},$ for some submodel. Similarly, for $1\leq j\leq p$ , define $L_{2}^{0}(P_{Z_{(j)}})$ to be the set of $h_{j}$ satisfying ${\mathbb{E}}\{\Delta h_{j}(Z_{(j)})\}=0$ , ${\mathbb{E}}\{\Delta h_{j}^{2}(Z_{(j)})\}<\infty$ and $h_{j}(z_{(j)})=\partial g_{j,(\nu)}(z_{(j)})/\partial\nu|_{\nu=0}$ for some submodel. The following lemma, which is Theorem 3.1 of Huang (1999), gives the information bound of $\beta_{0}$ .

Lemma 1 (Theorem 3.1 of Huang (1999)).

Suppose that models (1) and (2) and Assumptions 1–4 are satisfied. Let $((\boldsymbol{a}^{\star})^{\top},(\boldsymbol{h}_{1}^{\star})^{\top},\cdots,(\boldsymbol{h}_{p}^{\star})^{\top})^{\top}$ be the unique, vector-valued function in $L_{2}(P_{Y})^{d}\times L_{2}^{0}(P_{Z_{(1)}})^{d}\times\cdots\times L_{2}^{0}(P_{Z_{(p)}})^{d}$ that minimizes

\displaystyle{\mathbb{E}}\left\{\Delta\|X-\boldsymbol{a}(Y)-\boldsymbol{h}_{1}(Z_{(1)})-\cdots-\boldsymbol{h}_{p}(Z_{(p)})\|^{2}\right\}.

(1)

The efficient score for estimation of $\beta_{0}$ is

\displaystyle\ell_{\beta_{0}}^{\star}(Y,\Delta,U)=\int_{0}^{\tau}\{X-\boldsymbol{a}^{\star}(y)-\boldsymbol{h}^{\star}(Z)\}dM(y),

where $\boldsymbol{h}^{\star}(Z)=\sum\limits_{j=1}^{p}\boldsymbol{h}_{j}^{\star}(Z_{(j)})$ and $\boldsymbol{a}^{\star}(y)={\mathbb{E}}\left\{X-\boldsymbol{h}^{\star}(Z)\mid Y=y,\Delta=1\right\}.$

(2)

The information bound for estimation of $\beta_{0}$ is

\displaystyle I(\beta_{0})={\mathbb{E}}\left[\left\{\ell_{\beta_{0}}^{\star}(Y,\Delta,U)\right\}^{\otimes 2}\right]={\mathbb{E}}\left[\Delta\left\{X-\boldsymbol{a}^{\star}(Y)-\boldsymbol{h}^{\star}(Z)\right\}^{\otimes 2}\right],

(7)

where $A^{\otimes 2}=AA^{\top}$ for any vector or matrix $A$ .

Additional assumptions are needed to obtain the asymptotic normality and efficiency of $\hat{\beta}$ . Denote $\boldsymbol{h}_{j}^{\star}=(\boldsymbol{h}_{j,1}^{\star},\cdots,\boldsymbol{h}_{j,d}^{\star})^{\top}$ for $1\leq j\leq p$ .

Assumption 6.

When ${\rm sha}(g_{0,j})\in\{1,2\}$ , there exist constant $\tilde{C}_{1}>0$ and $\tilde{C}_{2}>0$ such that $\|\boldsymbol{h}^{\star}_{j}(x_{1})-\boldsymbol{h}^{\star}_{j}(y_{1})\|\leq\tilde{C}_{1}|x_{1}-y_{1}|$ and $|g_{0,j}^{-1}(x_{2})-g_{0,j}^{-1}(y_{2})|\leq\tilde{C}_{2}|x_{2}-y_{2}|.$ for all $x_{1},y_{1}\in[0,1]$ and $x_{2},y_{2}\in[-M_{0},M_{0}]$ .

Assumption 7.

When ${\rm sha}(g_{0,j})\in\{3,4\}$ , the function $\boldsymbol{h}_{j,\tilde{i}}^{\star}(\cdot)$ is $\bar{p}$ -H $\rm\ddot{o}$ lder continuous for all $1\leq\tilde{i}\leq d$ and some $\bar{p}\in(\rho(\boldsymbol{k}_{0}),\;2]$ .

Assumption 8.

When ${\rm sha}(g_{0,j})\in\{3\}$ , the function $g_{0,j}(t)-\tilde{C}t^{2}$ is convex for some constant $\tilde{C}>0$ ; when ${\rm sha}(g_{0,j})\in\{4\}$ , the function $g_{0,j}(t)+\tilde{C}t^{2}$ is concave for some constant $\tilde{C}>0$ .

Assumption 6 is used to control the fluctuation of the score function corresponding to the monotone components in the direction of the projection defined in Lemma 1. (Huang, 2002; Cheng, 2009) adopted a similar assumption. Assumptions 7–8, which are analogues of Assumption B1 of Kuchibhotla et al. (2023), are used for approximations of the convex (concave) components.

Theorem 3.

Suppose that models (1) and (2) and Assumptions 1–8 are satisfied. Further assume that $I(\beta_{0})$ is non-singular. Then as $n\rightarrow\infty$ , $\sqrt{n}(\hat{\beta}-\beta_{0})\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,I^{-1}(\beta_{0})).$ This implies that the asymptotic variance achieves the information bound and $\hat{\beta}$ is asymptotically semiparametric efficient among all regular estimators of $\beta_{0}$ .

By Theorem 3, $\hat{\beta}$ has an asymptotically normal distribution with asymptotic variance $I^{-1}(\beta_{0})$ . When making inference about $\beta_{0}$ based on this theorem, we need to construct a consistent estimator for $I^{-1}(\beta_{0})$ . However, $I^{-1}(\beta_{0})$ or equivalently $I(\beta_{0})$ has a rather complicated form, making its plug-in estimator not easy to use. To crack this nut, we propose a novel data-splitting estimation method to estimate $I^{-1}(\beta_{0})$ .

4.1 Data-splitting variance estimation and inference on $\beta$

We introduce the proposed data-splitting variance estimation method under a general setting as it is applicable generally. Let $\theta_{0}\in\mathbb{R}^{d}$ be a functional of a statistical population $\mathscr{P}$ and $\hat{\theta}_{n}$ be an estimator of $\theta_{0}$ based on i.i.d samples $\{O_{i}\}_{i=1}^{n}$ from $\mathscr{P}$ . Suppose that $\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma)$ , where $\Sigma$ is a semi-positive matrix. Let $k_{n}<n$ and $k_{n}\rightarrow\infty$ . We partition the sample into $\lfloor k_{n}\rfloor$ subsamples, each of which has $m_{n}=\lfloor n/k_{n}\rfloor$ observations, and let $\hat{\theta}_{ni}$ denote the estimator of $\theta_{0}$ based on the $i$ -th subsample, $1\leq i\leq\lfloor k_{n}\rfloor$ . Our splitting-data estimator for the asymptotic variance $\Sigma$ is defined as

\displaystyle\hat{\Sigma}=\frac{m_{n}}{\lfloor k_{n}\rfloor}\sum_{i=1}^{\lfloor k_{n}\rfloor}(\hat{\theta}_{ni}-\bar{\hat{\theta}}_{n})(\hat{\theta}_{ni}-\bar{\hat{\theta}}_{n})^{\top},

(8)

where $\bar{\hat{\theta}}_{n}$ is the sample mean of $\hat{\theta}_{n1},\ldots,\hat{\theta}_{n\lfloor k_{n}\rfloor}$ . For better stableness, we may repeat the above splitting and estimating procedure for many times and take the average of the resulting variance estimates as a final variance estimate.

Theorem 4.

Let $\hat{\theta}_{n}$ be an estimator of $\theta_{0}$ based on i.i.d samples $\{O_{i}\}_{i=1}^{n}$ and $\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma)$ . Let $k_{n}=n^{\tilde{\alpha}}$ for some $\tilde{\alpha}\in(0,1)$ , $m_{n}=\lfloor n^{1-\tilde{\alpha}}\rfloor$ and $\hat{\Sigma}$ be the variance estimator in (8). Then $\hat{\Sigma}=\Sigma+o_{p}(1)$ as $n\rightarrow\infty$ .

Theorem 4 guarantees the validity of the data-splitting variance estimator. This method is very easy to use and is flexible enough for general purpose. Alternatively, we may construct a variance estimator by bootstrap. However, the consistency of a bootstrap variance estimator often requires stronger conditions (Groeneboom and Hendrickx, 2017) and is often very difficult to prove, especially under shape restrictions.

As a specific application, we apply the data-splitting estimation method to construct an estimator for the information bound or the asymptotic variance $I^{-1}(\beta_{0})$ of $\hat{\beta}$ . Denote the resulting estimator by $\widehat{I^{-1}(\beta_{0})}$ , which is consistent to $I^{-1}(\beta_{0})$ by Theorem 4. Therefore $n(\hat{\beta}-\beta_{0})^{\mathrm{\scriptscriptstyle\top}}\{\widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta_{0})$ follows asymptotically $\chi^{2}(d)$ , a chisquare distribution of $d$ degrees of freedom. For $\alpha\in(0,1)$ , let $\chi^{2}_{1-\alpha}(d)$ be the $(1-\alpha)$ quantile of $\chi^{2}(d)$ . A $(1-\alpha)$ -level confidence region for $\beta_{0}$ can be constructed as

\displaystyle\{\beta:n(\hat{\beta}-\beta)^{\mathrm{\scriptscriptstyle\top}}\{\widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta)\leq\chi^{2}_{1-\alpha}(d)\}.

(9)

And for the hypothesis $H_{0}:\beta=\beta_{0}\leftrightarrow H_{1}:\beta\neq\beta_{0}$ , we propose to reject $H_{0}$ at the significance level $\alpha$ if

\displaystyle n(\hat{\beta}-\beta_{0})^{\mathrm{\scriptscriptstyle\top}}\{\widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta_{0})>\chi^{2}_{1-\alpha}(d).

(10)

By Theorems 3 and 4, the confidence region (9) has an asymptotically correct $(1-\alpha)$ coverage probability, and the test defined by the rejection region (10) has an asymptotically correct type I eror $\alpha$ .

5 Simulations

In this section, we conduct simulations to assess the finite-sample performance of the proposed SMPLE $\hat{\beta}$ and the proposed confidence region (9) for the linear covariate effect $\beta$ . To generate data, we take $X$ and $Z$ to be two scalar random variables, which are iid from the standard normal distribution, and take the conditional distribution of $T$ given $(X,Z)$ to be an exponential distribution with mean $1/\exp(\beta_{0}X+g_{0}(Z))$ . Therefore the conditional hazard function of $T$ given $(X,Z)$ is $\exp(\beta_{0}X+g_{0}(Z))$ . We set the censoring time $C$ to follow a uniform distribution on $(0,c)$ . We consider three scenarios for $g_{0}$ : (I) $g_{0}(z)=-2z$ , (II) $g_{0}(z)=-|z|^{3}/2$ and (III) $g_{0}(z)=2|z|$ . We set $\beta_{0}=-2$ and consider two choices for $c$ : $5$ and $10$ , and two sample sizes: $600$ and $800$ . A larger $c$ results in a smaller censoring proportion.

When implementing our SMPLE, we set $\boldsymbol{k}_{0}$ to be $3,4,3$ in Scenarios I–III, respectively. For comparison, we also consider the traditional Cox regression estimator (TCR) of $\beta$ and the partial likelihood estimator of $\beta$ with the $r$ -order polynomial splines under the partially linear additive model (Huang, 1999, SPLA- $r$ ), where $r$ may be $2,3$ or $4$ . We generate 1000 random samples to evaluate the performance of the above five estimation methods.

5.1 Point estimation

Table 1 presents $100$ times the simulated root mean square errors (RMSEs) and the absolute biases (BIASs) of these estimators. The model assumptions of SMPLE and the SPLA- $r$ are correct in all the three scenarios, whereas the standard Cox model is correctly specified only in Scenario I. As expected, in Scenario I, TCR has uniformly the best performance among the five estimators under comparison in terms of RMSE and BIAS. Nevertheless, the SMPLE and the SPLA- $r$ estimators have almost the same RMSEs and BIASs. When the standard Cox model is misspecified in Scenarios II and III, TCR has much larger RMSEs and BIASs than SMPLE and the SPLA- $r$ , or equivalently, SMPLE and the SPLA- $r$ have clear priority over TCR. Compared with SPLA- $r$ , SMPLE is comparable and slightly inferior in Scenarios I and II, but has uniformly much smaller RMSEs and BIASs in Scenario III. A possible explanation for this phenomenon is that although continuous in all three scenarios, the hazard function is smooth in Scenarios I and II but nonsmooth in Scenario III. As the sample size $n$ or the constant $c$ increases, we have more completely observed data, consequently all estimators have improved performance when the underlying model assumption is correct. A counterexample is the performance of TCR in Scenarios II and III, where TCR has larger RMSEs and BIASs as $n$ or $c$ increases.

Figure 1 displays the boxplots of the SMPLE and SPLA- $r$ (r=2, 3, 4) estimators (minus $\beta_{0}$ ) of $\beta$ under study when the sample size is $n=800$ . TCR is excluded here as it has extremely large RMSEs and BIASs in Scenarios II and III. SMPLE and the three SPLA- $r$ exhibit almost the same performance in Scenario I. In Scenario II, where the true hazard function is smooth, the four methods have close variances, but from SMPLE, to SPLA- $2$ , SPLA- $3$ , and SPLA- $4$ , their BIASs become smaller and smaller. In Scenario III, where the true hazard function is nonsmooth, the four methods have close variances again, however the three SPLA- $r$ estimators have much larger BIASs than SMPLE, whose BIASs are negligible.

Table 1: Simulated root mean square errors and absolute biases (in parentheses) of the five point estimators under comparison. All results have been multiplied by 100.

Scenario	$c$	$n$	SMPLE	TCR	SPLA- $2$	SPLA- $3$	SPLA- $4$
I	5	600	$\underset{(7.29)}{9.25}$	$\underset{(7.17)}{9.13}$	$\underset{(7.21)}{9.17}$	$\underset{(7.27)}{9.22}$	$\underset{(7.32)}{9.29}$
	5	800	$\underset{(6.36)}{8.04}$	$\underset{(6.30)}{7.95}$	$\underset{(6.33)}{7.99}$	$\underset{(6.35)}{8.01}$	$\underset{(6.36)}{8.03}$
	10	600	$\underset{(7.19)}{8.96}$	$\underset{(7.07)}{8.86}$	$\underset{(7.11)}{8.90}$	$\underset{(7.17)}{8.96}$	$\underset{(7.21)}{9.01}$
	10	800	$\underset{(6.07)}{7.65}$	$\underset{(6.03)}{7.57}$	$\underset{(6.05)}{7.61}$	$\underset{(6.05)}{7.62}$	$\underset{(6.06)}{7.64}$
II	5	600	$\underset{(8.19)}{10.29}$	$\underset{(67.57)}{68.89}$	$\underset{(7.65)}{9.46}$	$\underset{(7.66)}{9.46}$	$\underset{(7.78)}{9.70}$
	5	800	$\underset{(6.68)}{8.49}$	$\underset{(69.02)}{70.06}$	$\underset{(6.53)}{8.17}$	$\underset{(6.44)}{8.10}$	$\underset{(6.36)}{8.09}$
	10	600	$\underset{(7.86)}{9.89}$	$\underset{(77.53)}{78.58}$	$\underset{(7.42)}{9.21}$	$\underset{(7.40)}{9.18}$	$\underset{(7.44)}{9.30}$
	10	800	$\underset{(6.46)}{8.20}$	$\underset{(78.79)}{79.62}$	$\underset{(6.48)}{8.04}$	$\underset{(6.38)}{7.93}$	$\underset{(6.15)}{7.78}$
III	5	600	$\underset{(6.87)}{8.54}$	$\underset{(63.15)}{63.46}$	$\underset{(18.83)}{20.54}$	$\underset{(17.63)}{19.39}$	$\underset{(7.95)}{9.74}$
	5	800	$\underset{(5.74)}{7.28}$	$\underset{(63.35)}{63.61}$	$\underset{(19.60)}{20.92}$	$\underset{(18.62)}{19.98}$	$\underset{(7.74)}{9.28}$
	10	600	$\underset{(6.78)}{8.43}$	$\underset{(63.40)}{63.70}$	$\underset{(19.40)}{21.05}$	$\underset{(18.21)}{19.91}$	$\underset{(8.04)}{9.86}$
	10	800	$\underset{(5.66)}{7.17}$	$\underset{(63.47)}{63.71}$	$\underset{(20.01)}{21.25}$	$\underset{(19.04)}{20.31}$	$\underset{(7.78)}{9.34}$

Refer to caption — Figure 1: Boxplots of the SMPLE, SPLA- $2$ , SPLA- $3$ and SPLA- $4$ estimates (minus $\beta_{0}$ ) of $\beta$ when the sample size is $n=800$ .

Table 2: Simulated coverage probabilities and average interval lengths (in parentheses) of the confidence intervals at the 95% confidence level based on the five estimators under comparison

Scenario	$c$	$n$	SMPLE	TCR	SPLA- $2$	SPLA- $3$	SPLA- $4$
I	5	600	$\underset{(0.412)}{0.947}$	$\underset{(0.384)}{0.930}$	$\underset{(0.396)}{0.940}$	$\underset{(0.409)}{0.944}$	$\underset{(0.422)}{0.948}$
	5	800	$\underset{(0.343)}{0.941}$	$\underset{(0.324)}{0.937}$	$\underset{(0.332)}{0.937}$	$\underset{(0.340)}{0.941}$	$\underset{(0.348)}{0.949}$
	10	600	$\underset{(0.388)}{0.940}$	$\underset{(0.364)}{0.930}$	$\underset{(0.374)}{0.937}$	$\underset{(0.385)}{0.940}$	$\underset{(0.395)}{0.942}$
	10	800	$\underset{(0.327)}{0.945}$	$\underset{(0.310)}{0.935}$	$\underset{(0.317)}{0.941}$	$\underset{(0.325)}{0.949}$	$\underset{(0.332)}{0.948}$
II	5	600	$\underset{(0.459)}{0.947}$	$\underset{(0.452)}{0.001}$	$\underset{(0.410)}{0.941}$	$\underset{(0.422)}{0.951}$	$\underset{(0.440)}{0.951}$
	5	800	$\underset{(0.382)}{0.950}$	$\underset{(0.396)}{0}$	$\underset{(0.346)}{0.926}$	$\underset{(0.355)}{0.938}$	$\underset{(0.368)}{0.949}$
	10	600	$\underset{(0.429)}{0.943}$	$\underset{(0.437)}{0}$	$\underset{(0.383)}{0.929}$	$\underset{(0.393)}{0.935}$	$\underset{(0.407)}{0.938}$
	10	800	$\underset{(0.355)}{0.945}$	$\underset{(0.383)}{0}$	$\underset{(0.321)}{0.919}$	$\underset{(0.329)}{0.931}$	$\underset{(0.340)}{0.942}$
III	5	600	$\underset{(0.384)}{0.955}$	$\underset{(0.261)}{0}$	$\underset{(0.339)}{0.424}$	$\underset{(0.350)}{0.494}$	$\underset{(0.364)}{0.908}$
	5	800	$\underset{(0.321)}{0.955}$	$\underset{(0.223)}{0}$	$\underset{(0.289)}{0.260}$	$\underset{(0.296)}{0.302}$	$\underset{(0.304)}{0.863}$
	10	600	$\underset{(0.373)}{0.952}$	$\underset{(0.254)}{0}$	$\underset{(0.330)}{0.370}$	$\underset{(0.340)}{0.441}$	$\underset{(0.353)}{0.893}$
	10	800	$\underset{(0.312)}{0.948}$	$\underset{(0.217)}{0}$	$\underset{(0.281)}{0.233}$	$\underset{(0.288)}{0.279}$	$\underset{(0.296)}{0.847}$

5.2 Interval estimation

We continue to compare the proposed confidence interval (CI) for $\beta$ with those based on TCR and the three SPLA- $r$ estimators. To be specific, let $\tilde{\beta}$ denote a generic estimator of $\beta$ and let $\sigma^{2}$ denote its asymptotic variance. We use the proposed data-splitting method with $\tilde{\alpha}=0.35$ to estimate the asymptotic variance $\sigma^{2}$ , and let $\tilde{\sigma}^{2}$ be the resulting estimator. Then a Wald-type confidence interval at the confidence level 95% for $\beta$ is $\tilde{\beta}\pm n^{-1/2}\tilde{\sigma}u_{0.975}$ and its length is $2n^{-1/2}\tilde{\sigma}u_{0.975}$ , where $u_{0.975}$ is the $0.975$ quantile of the standard normal distribution. Table 2 presents the simulated coverage probabilities and average interval lengths of the Wald-type confidence intervals based on the five estimators.

We observe that in all the three scenarios, the SMPLE-based CI always has very accurate coverage probabilities. In contrast, the TCR-based CI has desirable coverage accuracy only in Scenario I, where the Cox model is correct, and its performance in Scenarios II and III is unacceptable. The CIs based on SPLA- $r$ have acceptable coverage accuracy in Scenarios I and II, although they may have slight undercoverage, but performs very poor in Scenario III. Interestingly, the CI based on SPLA- $r$ with a larger $r$ have more accurate coverage accuracy. To get more insights about these simulation results, we display in Figure 2 the QQ-plots of the standardized estimates $\sqrt{n}\{\widehat{I^{-1}(\beta_{0})}\}^{-1/2}(\tilde{\beta}-\beta_{0})$ for $\tilde{\beta}$ being the SMPLE and SPLA- $4$ estimators. We see that the distribution of the standardized SMPLE estimator is always very close to the standard normal. In contrast, the distribution of the standardized SPLA- $4$ estimator is close to the standard normal in Scenarios I and II, but far away from the standard normal in Scenario III. This may explains why the SMPLE-based CI always has desirable coverage accuracy but the SPLA- $4$ -based CI has severely undercoverage in Scenario III.

In summary, when the Cox model is correct, the point and interval estimators based on the proposed SMPLE have very close performance as those based on TCR, which is the optimal estimation method under the Cox model. And they have obvious advantages over those based on TCR or SPLA- $r$ when the Cox model is incorrect or the hazard function is nonsmooth. Another advantage of SMPLE over SPLA- $r$ is that it is free from tuning parameters but the latter is not. Tuning parameters usually have a big influence on the subsequent analysis but their optimal choices are difficult to determine in practice.

6 A real application

We apply the proposed SMPLE method to analyze the Rotterdam Breast Cancer (RBC) dataset, which is publicly available from the R package survival. This dataset was used in Royston and Altman (2013) to perform external validation of a Cox prognostic model. The RBC dataset comprises 2982 primary breast cancers patients whose records were included in the Rotterdam tumor bank. We focus on studying potential factors that affect the survival time ( $T$ ) of breast cancers patients, defined as the days from primary surgery to the death. We consider four covariates: progesterone receptors (unit: fmol/l, $X$ ), age at surgery ( $Z_{1}$ ), number of positive lymph nodes ( $Z_{2}$ ), estrogen receptors (unit: fmol/l, $Z_{3}$ ). Let $Z=(Z_{1},Z_{2},Z_{3})^{\mathrm{\scriptscriptstyle\top}}$ . All covariates have been scaled to between $0$ and $1$ . The impact of the content of progesterone receptors on the treatment of breast cancer is not clear. We consider modelling the conditional hazard of $T$ given $(X=x,Z=z)$ by the following partially linear Cox additive model

\displaystyle\lambda_{T}(t\mid x,z)=\lambda(t)\exp\{x\beta+f_{1}(z_{1})+f_{2}(z_{2})+f_{3}(z_{3})\},

(11)

where $\lambda(\cdot)$ , $f_{1}(\cdot)$ $f_{2}(\cdot)$ and $f_{3}(\cdot)$ are all left un-specified. Since elderly patients are typically predisposed to a higher risk of cancer deterioration or recurrence, we postulate that $f_{1}(\cdot)$ is monotonically increasing. Similarly, $f_{2}(\cdot)$ and $f_{3}(\cdot)$ are also assumed to be monotonically increasing functions. Inspired by our extensive simulation results and discussion in Remark 1, we further impose a convexity or concavity restriction on $f_{1}(\cdot)$ – $f_{3}(\cdot)$ .

Because menopause generally affects the therapeutic effect and the mortality of breast cancer, we categorize the patients into two groups based on whether they have reached menopause or not. Group I (premenopausal) comprises $1312$ samples, exhibiting a censor rate of $64.33\%$ , while Group II (postmenopausal) consists of $1670$ samples with a censor rate of $51.86\%$ . We apply the point and interval estimation methods considered in the simulation section to estimate the linear covariate effect under model (11). We set the confidence level to 95% when constructing confidence intervals. In the proposed variance estimation method, we take $\tilde{\alpha}=0.3$ and construct variance estimates by repeating the splitting and estimating procedure twenty times and taking average. Table 3 summarizes the analysis results.

Table 3: Results of point and interval estimation for the RBC dataset.

	Group I		Group II
	Point estimate	Interval estimate	Point estimate	Interval estimate
SMPLE	$-5.96$	$[-8.96,-2.95]$	$-0.62$	$[-1.23,-0.01]$
TCR	$-5.19$	$[-8.22,-2.16]$	$-0.70$	$[-1.41,0.01]$
SPLA- $2$	$-5.33$	$[-8.48,-2.18]$	$-0.39$	$[-1.08,0.30]$
SPLA- $3$	$-4.98$	$[-8.26,-1.69]$	$-0.40$	$[-1.13,0.33]$
SPLA- $4$	$-5.15$	$[-8.66,-1.64]$	$-0.34$	$[-1.08,0.39]$

The point estimates of $\beta$ are about $-5$ or less in Group I, which are much less than those in Group II, namely $-0.7$ or larger. This implies that the effect of progesterone receptors ( $X$ ) on the hazard rate of survival time among breast cancer patients has a significant increase in Group II than in Group I, which potentially stems from the substantial reduction in progesterone levels in women’s bodies post-menopause. For patients who have not reached menopause, the $95\%$ confidence intervals derived from all five methods exclude $0$ , suggesting a significant effect of the progesterone receptors on the survival time of breast cancer patients. Among these techniques, our proposed SMPLE method produces the shortest confidence interval. Conversely, for postmenopausal patients, only our SMPLE-based confidence interval excludes $0$ , which together with the point estimate $-0.62$ suggests that the progesterone receptors still has a significant effect in increasing the survival time of breast cancer patients. Once again, our confidence interval exhibits the shortest confidence interval. According to our reasonable shape restriction assumptions and our simulation experience especially those in Scenario III, we believe that the analysis results based on our SMPLE are more reliable, which also demonstrate its priority over the competing methods.

Figure 3 depicts the estimates of $f_{1}(\cdot)$ – $f_{3}(\cdot)$ in model (3) under our shape constraints. All the estimated curves are continuous and piecewise linear functions in both groups. We see that both number of positive lymph nodes ( $Z_{2}$ ) and estrogen receptors ( $Z_{3}$ ) have very similar effects in the two groups on the survival time $T$ . Exceptionally, the covariate age at the time of surgery ( $Z_{1}$ ) has a significantly more pronounced effect on $T$ in Group II than in Group I. This discrepancy may stem from the fact that premenopausal women tend to be younger, and thus, age has not yet exerted a substantial influence on their physical condition.

7 Concluding remarks

In this paper, we study an additive and shape-restricted partially linear Cox model. We systematically investigate the consistencies and convergence rates of the SMPLEs for the additive components and the baseline cumulative hazard function. Notably, the convergence rates of the SMPLEs for the infinite-dimensional parameters, including the additive components and $\Lambda(\cdot)$ , are independent of the covariate dimension, exhibiting an elegant form. We find that the SMPLE for $\beta$ is $\sqrt{n}$ consistent and asymptotically semiparametric efficiency.

To estimate the asymptotic variance, we propose a flexible variance estimation method by data splitting. We prove that the resulting variance estimator is consistent if $\tilde{\alpha}\in(0,1)$ , namely, both the number of split samples and the sample size of each split sample all goes to infinity. With this variance estimator, valid confidence regions for the linear covariate effect can be constructed. We set $\tilde{\alpha}$ to 0.3 around in our numerical studies. Natural questions are if there is an optimal choice of $\tilde{\alpha}$ , and if so, how to determine the optimal value. For the time being, we do not have any results on these questions, which may be left for future research.

In addition to the efficiency studies of the estimation of the linear covariate effect, there is also a compelling interest in developing the minimax lower bound for the estimation of the additive and shape-restricted components. We conjecture that, by leveraging the techniques in (Guntuboyina and Sen, 2015; Zhong et al., 2022), the minimax lower bound under the $L_{2}(P)$ norm for the convex/concave component may be (asymptotically) lower bounded by $n^{-2/5}$ up to a constant factor. Furthermore, if none of additive components is monotonic, the SMPLEs for the convex/concave components are rate-optimal, as suggested by Theorem 1. However, a formal proof of this result is beyond the scope of this paper.

References

Balabdaoui et al. (2019) Balabdaoui, F., Groeneboom, P., and Hendrickx, K. (2019). Score estimation in the monotone single-index model. Scandinavian Journal of Statistics, 46(2), 517–544.
Bickel et al. (1993) Bickel, P. J., Klaassen, J., Ritov, Y., and Wellner, J. A. (1993). Efficient and adaptive estimation for semiparametric models, volume 4. Springer.
Chang et al. (2007) Chang, I.-S., Chien, L.-C., Hsiung, C. A., Wen, C.-C., and Wu, Y.-J. (2007). Shape restricted regression with random bernstein polynomials. Lecture Notes-Monograph Series, pages 187–202.
Chen and Samworth (2016) Chen, Y. and Samworth, R. J. (2016). Generalized additive and index models with shape constraints. Journal of the Royal Statistical Society, 78(4), 729–754.
Cheng (2009) Cheng, G. (2009). Semiparametric additive isotonic regression. Journal of Statistical Planning and Inference, 139(6), 1980–1991.
Cox (1972) Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–202.
Cox (1975) Cox, D. R. (1975). Partial likelihood. Biometrika, 62(2), 269–276.
Deng et al. (2023) Deng, G., Xu, G., Fu, Q., Wang, X., and Qin, J. (2023). Active-set algorithm-based statistical inference for shape-restricted generalized additive cox regression models. Journal of Statistical Computation and Simulation, 93(3), 416–441.
Deng and Zhang (2020) Deng, H. and Zhang, C.-H. (2020). Isotonic regression in multi-dimensional spaces and graphs. The Annals of Statistics, 48(6), 3672–3698.
Feng et al. (2022) Feng, O. Y., Chen, Y., Han, Q., Carroll, R. J., and Samworth, R. J. (2022). Nonparametric, tuning-free estimation of s-shaped functions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(4), 1324–1352.
Groeneboom and Hendrickx (2017) Groeneboom, P. and Hendrickx, K. (2017). The nonparametric bootstrap for the current status model. Electronic Journal of Statistics, 11(2), 3446–3484.
Groeneboom and Jongbloed (2014) Groeneboom, P. and Jongbloed, G. (2014). Nonparametric Estimation under Shape Constraints: Estimators, Algorithms and Asymptotics. London, UK: Cambridge University Press.
Guntuboyina and Sen (2015) Guntuboyina, A. and Sen, B. (2015). Global risk bounds and adaptation in univariate convex regression. Probability Theory and Related Fields, 163(1), 379–411.
Hastie and Tibshirani (1986) Hastie, T. J. and Tibshirani, R. J. (1986). Generalized additive models. Statistical Science, 1(3), 297–310.
Hastie and Tibshirani (1990) Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Aditive Models. London: Chapman and Hall.
Heller (2001) Heller, G. (2001). The cox proportional hazards model with a partly linear relative risk function. Lifetime data analysis, 7, 255–277.
Horowitz and Lee (2017) Horowitz, J. L. and Lee, S. (2017). Nonparametric estimation and inference under shape restrictions. Journal of Econometrics, 201(1), 108–126.
Huang (1999) Huang, J. (1999). Efficient estimation of the partly linear additive Cox model. The Annals of Statistics, 27(5), 1536–1563.
Huang (2002) Huang, J. (2002). A note on estimating a partly linear model under monotonicity constraints. Journal of Statistical Planning and Inference, 107(1-2), 343–351.
Kuchibhotla et al. (2023) Kuchibhotla, A. K., Patra, R. K., and Sen, B. (2023). Semiparametric efficiency in convexity constrained single index model. Journal of the American Statistical Association, 118(541), 272–286.
Matzkin (1991) Matzkin, R. L. (1991). Semiparametric estimation of monotone and concave utility functions for polychotomous choice models. Econometrica, 59(5), 1315–1327.
O’Sullivan (1993) O’Sullivan, F. (1993). Nonparametric estimation in the cox model. The Annals of Statistics, 21(1), 124–145.
Qin et al. (2014) Qin, J., Garcia, T. P., Ma, Y., Tang, M.-X., Marder, K., and Wang, Y. (2014). Combining isotonic regression and em algorithm to predict genetic risk under monotonicity constraint. The annals of applied statistics, 8(2), 1182–1208.
Qin et al. (2021) Qin, J., Deng, G., Ning, J., Yuan, A., and Shen, Y. (2021). Estrogen receptor expression on breast cancer patients’ survival under shape-restricted cox regression model. The annals of applied statistics, 15(3), 1291–1307.
Rong et al. (2024) Rong, Y., Zhao, S. D., Zheng, X., and Li, Y. (2024). Kernel cox partially linear regression: Building predictive models for cancer patients’ survival. Statistics in Medicine, 43(1), 1–15.
Royston and Altman (2013) Royston, P. and Altman, D. G. (2013). External validation of a cox prognostic model: principles and methods. BMC medical research methodology, 13, 1–15.
Sasieni (1992) Sasieni, P. (1992). Information bounds for the conditional hazard ratio in a nested family of regression models. Journal of the Royal Statistical Society: Series B (Methodological), 54(2), 617–635.
Sleeper and Harrington (1990) Sleeper, L. A. and Harrington, D. P. (1990). Regression splines in the cox model with application to covariate effects in liver disease. Journal of the American Statistical Association, 85(412), 941–949.
Varian (1984) Varian, H. R. (1984). The nonparametric approach to production analysis. Econometrica, 52(3), 579–597.
Wang and Ghosh (2012) Wang, J. and Ghosh, S. K. (2012). Shape restricted nonparametric regression with bernstein polynomials. Computational Statistics & Data Analysis, 56(9), 2729–2741.
Zhong et al. (2022) Zhong, Q., Mueller, J., and Wang, J.-L. (2022). Deep learning for the partially linear Cox model. The Annals of Statistics, 50(3), 1348–1375.