Abstract

This paper proposes a new and important class of mean residual life regression model, which is called the mean residual life transformation model.  The link function is assumed to be unknown and increasing in its second argument, but it is permitted to be not differentiable. The mean residual life transformation model encompasses the proportional mean residual life model, the additive mean residual life model, and so on. Under maximum rank correlation estimation, we present the estimation procedures, whose asymptotic and finite sample properties are established. The consistent variance can be estimated by a resampling method via perturbing the -statistics objective function repeatedly which avoids the usual sandwich choice. Monte Carlo simulations reveal good finite sample performance and the estimators are illustrated with the Oscar data set.

1. Introduction

The mean residual life function at time is the remaining life expectancy for those given survival up to . It is defined aswhere is the time from the onset to the failure event of interest and is the survival function of . The mean residual life function is often of interest and obviously important in many fields of application, such as industrial reliability study, actuarial research, and survival analysis. For example, when is the patient’s remaining life from cancer to death, is the remaining life expectancy for those given survival up to . The patients and physicians both care about how long cancer patients can survive. When is the minimum score of Graduate Record Examination foreign language test for a school graduate program, is the average excess score for those more than 300 points. From model (1), , and it is easy to induce that the conditional survival function of given covariate :

How to describe the effect of the factors on the remaining residual life expectancy has aroused much interest (Song et al., Wang et al., Lin et al., and therein [13]). For example, they are interested in exploring the relation between the remaining survival time of cancer patients and age, smoking, and various treatments. All kinds of semiparametric statistical models have been widely used for assessing such relation. For instance, in order to concern with the correlation between and , Oakes and Dasu [4] studied the proportional mean residual life model:where is the unknown baseline hazard function, covariate is -dimensional vector, and is the true parameter. The proportional mean residual life model can compare the ratio when the covariate is a binary indicator. For example,where , combined treatment, and , no treatment. For instance, means that the combined treatment would be 3 times the mean remaining life as compared to no treatment.

In model (3) without censoring, Maguluri and Zhang [5] introduced the estimation method for . The regression methods for model (3) with right censoring data have been studied by Chen and Cheng [6]. Chen and Cheng [6] proposed the estimating equations which can be used to estimate and :

Sun and Zhao [7] presented a class of mean residual life regression model:where is a prespecified link function and is assumed to be continuous almost everywhere and twice differentiable. In model (6) with censoring data, Sun and Zhao [7] proposed the estimating equations which can be used to estimate and :where . They also developed estimation procedures and a goodness-of-fit test for model (6) with right censoring data.

Just like the additive hazards regression model being an attractive alternative to the Cox proportional hazards model, Chen [8] considered the additive mean residual life model:

Chen [8] adapted estimating equations for estimating parameter . The model (8) explains the relative difference for mean residual life:

In this paper, we consider a new and important class of mean residual life regression model given bywhere link function is assumed to be an unknown and increasing function in its second argument. The function is permitted to be not differentiable. Model (10) is called the mean residual life transformation model. Obviously, the mean residual life transformation model includes the proportional mean residual life model (3) and the additive mean residual life model (8) and partially extends model (6). Model (10) is robust, but traditional methods like the counting process approach cannot be used to estimate the unknown regression parameter because the estimating equations depend on the differentiation of link function.

In this paper, under maximum rank correlation estimation, we propose the smoothed objective function to estimate unknown regression parameter for model (10) with complete data and right censoring data. We motivate this paper in three ways. First, the consistencies and asymptotic normalities of such approximated estimators for the general mean residual life model have not been established yet in the literature. Second, the smoothed objective functions make it feasible to adapt the rank-based approach, without loss of asymptotic efficiency. Third, the censoring variable is correlated, not being independent with a covariate.

The rest of this article is organized as follows. In the next section, we give estimation procedures for model (10) with complete data. Section 3 considers the extension for right censoring data. A numerical study of Monte Carlo simulations and an application to the Oscar data are conducted in Section 4 to investigate the finite sample properties. Section 5 discusses future research.

2. Estimation

Without loss of generality, we suppose the survival time is a positive random variable and is absolutely continuous with a density with respect to the Lebesgue measure. In the mean residual life transformation model (10), the link function is unknown; thus, the approach of estimation equations cannot estimate parameter for model (10). Under mean residual life transformation model (10) with complete data, we present the following smoothed objective function to estimate parameter :where is a parameter subset of , is a strictly positive and decreasing number satisfying , and is the indicator function.

To obtain the consistency of , the sufficient and necessary condition of the following proposition is important.

We list four regularity assumptions for rank estimation below. All assumptions are ordinary, which are also used in other literature.

Assumption 1. (Covariates): the support of the regressor is not contained in a proper linear subspace of . ’s first component has an everywhere positive Lebesgue density, conditional on .

Assumption 2. (Normalization): the parameter space is a compact subset of .
To identify the parameter , Assumptions 1 and 2 are sufficient conditions. To obtain the asymptotic normality, we list Assumptions 3 and 4, which were presented by Sherman [9].

Assumption 3. Let be a neighborhood of true parameter .(i)All mixed second partial derivatives of exist on , for any .(ii)There exists an integrable function . For any and , satisfieswhere .(iii).(iv), where .(v)The dimensional matrix is negative definite.

Assumption 4. Let be a neighborhood of true parameter . Then,(i)All mixed second partial derivatives of exist on , for any .(ii)There exists an integrable function . For any and , satisfies(iii).(iv).(v)The dimensional matrix is negative definite.

Proposition 1. Under mean residual life transformation model (10) with complete data,It is known that the interest parameter is not unique, that is, not identifiable. To avoid identifiability problem and without loss of generality, the first component of is assumed to be 1, that is, . Recall that denotes an observation from the distribution on the set . For each in , defineThe consistency and asymptotic normality of are given by the following Theorem 1.

Proof . of Proposition 1.where the fourth equality holds since the hazard function . Then,Becausethe derivative of isBecause is an increasing function in , that is, , thenThus, is an increasing function of . Because , where is a cumulative hazard function. Hence, is a decreasing function of , that is,Therefore,

Theorem 1. Under Assumptions 1 and 2,where denotes convergence in probability. Under Assumptions 1-3,where denotes convergence in distribution and has a normal distribution withFrom theorem 1, we find that the asymptotic variance of is , which is equivalent to nonsmooth. Also, the smooth estimation has another advantage, which avoids the situation that the estimation of involves maximizing indicator function. The asymptotic variance of can be estimated by a resampling method, which was studied by Cai et al. [10]. For each in , let the mean and variance of the positive random variable both be 1. Define

Theorem 2. Under Assumptions 13,where follows and and are given in Theorem 1.
From theorem 2, we find that the asymptotic variance of can be approximated by the sample variance of . The proof of Theorem 2 is omitted because we can use similar proof as Cai et al. [10].

3. Extension

For right censoring data, we observe the data , where , . are right censoring random variables, .

Under model (10) with right censoring data, we present the following smoothed objective function to estimate parameter :where

is a parameter subset of , and is a strictly positive and decreasing number satisfying . The objective function (28) is inspired by Khan and Tamer [11] and Song et al. [12].

For obtaining the consistency of , the identification (i.e., uniquely maximum) is an important condition. As Khan and Tamer [11] proposed, define

Proposition 2. Under mean residual life transformation model (10) with censoring data,Recall that denotes an observation from the distribution on the set . For each in , defineThe consistency and asymptotic normality of are given by Theorem 3.

Proof . of Proposition 2.
First of all, we calculate the conditional expectation of given , which can be decomposed as two expressions:The two above expressions are calculated as follows:Thus,Similarly, the conditional expectation of given isThe rest of the proof of Proposition 2 is the same as Proposition 1.

Theorem 3. Under Assumptions 1 and 2,where denotes convergence in probability. Under Assumptions 1, 2, and 4,where denotes convergence in distribution and has a normal distribution withFrom Theorem 3, we find that the asymptotic variance of is , which is equivalent to nonsmooth. Also, the smooth estimation has another advantage, which avoids the situation that the estimation of involves maximizing indicator function. The asymptotic variance of can be estimated by a resampling method, which was studied by Cai et al. [10]. For each in , let the mean and variance of the positive random variable both be 1. Definewhere

Proof . of Theorem 3.
We consider the nonsmoothly objective function of :DefineThen,that is,First of all, we prove the consistency of .
Identification ( has unique maximum at ) follows from Proposition 2 and Assumption 1. We omit the details, which follow from identical steps used in proving Theorem 2.1 (Khan and Tamer [11]).
Obviously, is a degenerated -statistic of order 2. Because the objective function is composed of an indicator, is Euclidean with an envelope. We can derive from Corollary 7 of Sherman [9] thatNote that the conclusion is stronger than .
By Assumption 1, it is easy to obtain that , where . Then,with probability one. Thus, continuity ( is continuous about ) follows directly from the dominated convergence theorem.
Because of Assumption 2 (compact parameter space), identification, uniform convergence, and continuity, we obtain .
In order to obtain , it suffices to show that . For any ,Note thatUnder Theorem 7 of Nolan and Pollard [13], it is easy to obtain thatAccording to Assumption 1, as . Hence,DefineBecause is an -statistic of order 2, it is easy to obtain that by the Hoeffding decomposition:Similarly to the proof of Theorem 4 of Sherman [9], we obtainAccording to Assumption 4, we obtainHence, we obtainuniformly in neighborhoods of , where , ,Then, by Theorem 2 of Sherman [9],where denotes convergence in distribution and has a normal distribution .

Theorem 4. Under Assumptions 1, 2, and 4,where follows and and are given in Theorem 3.

4. Numerical Examples

4.1. Monte Carlo Results

This section aims to conduct the finite sample properties of parameter estimation for model (10) with complete data and for model (10) with right censoring data by a number of simulation studies. Simulation results are reported for 100 and 400 observations, with 500 replications. We choose the smoothed parameter . Weighted random variables follow a standard exponential distribution with 500 replications. The empirical bias (BIAS), empirical standard deviation (SD), an average of the estimated standard errors (SE), and empirical coverage probability of confidence interval (CP) are reported in Tables 14.

Our design for model (10) is specified bywhere and is assumed to be an increasing function in . The following subsections consider three functional forms for link function and different covariates .

4.1.1. Complete Data

First, we want to evaluate the performance of parameter estimation for model (10) with complete data. The first link function and different covariates which we consider are as follows:where , or 1, is generated from a uniform , follows a chi-square distribution with 1 degree of freedom, and is independent of . According to the conditional distribution function , equation (2) and model (53), then .

Let denote the parameter estimation for additive mean residual life model (53) with complete data proposed by Chen [8]. Table 1 shows the simulation results of and for the additive model (53) with complete data. The biases and standard errors of parameter estimation and decrease as sample size from 100 to 400. The biases and standard errors of parameter estimation perform well as sample size . The average of estimated standard errors of are close to empirical standard deviations, and the empirical coverage probabilities are approximately 95%. The biases of parameter estimation are smaller than . The standard errors of are larger than ; it is probably because that mean residual life transformation model (11) is more general than the additive model (8). The results of Table 1 demonstrate that the parameter estimation for the mean residual life transformation model (10) with complete data is robust and reasonable.

4.1.2. Censoring Data

Second, we want to evaluate the performance of parameter estimation for model (10) with right censoring data. The second link function and different covariates which we consider are as follows:where , , follows a binary distribution of 0 or 1 with equal probabilities of 0.5, follows a standard normal distribution, and is independent of . Similar to the above section, we calculate survival time for model (61). The right censoring variable follows an exponential distribution, where the exponential parameter is selected to yield the censoring rate which is approximately 10%, 30%, and 60%, respectively.

Let be the parameter estimation for the proportional mean residual life model (3) with right censoring data being studied by Chen and Cheng [6]. The simulation results of and for the proportional model with right censoring data are shown in Table 2. The mean biases and standard errors of and increase when the censoring rate increases or sample size decreases. The average of estimated standard errors of is close to empirical standard deviations, and the empirical coverage probabilities are approximately 95%. The biases of parameter estimation are smaller than . The standard errors of are larger than ; it is probably because that mean residual life transformation model (10) is more general than the proportional mean residual life model (3). The results of Table 2 demonstrate that the proposed estimator is comparable with .

4.1.3. Nondifferential Link Function

Third, we want to evaluate the performance of parameter estimation for model (10) with right censoring data and nondifferential link function. The third link function and different covariates which we consider are as follows:

where , , , follows a chi-square distribution with 1 degree of freedom, is generated from a uniform , and is independent of .

The simulation results for model (62) with right censoring data and nondifferential link function are summarized in Table 3. Table 3 shows that performs well in finite sample size  = 100 and 400. The results are in accordance with the theory; that is, is asymptotically unbiased and normal. The mean biases and standard errors of decrease as sample size increases. The 95% confidence intervals of are close to their nominal levels. With the censoring rate increasing, the mean biases and standard errors of increase. From Table 3, we find that is suitable for the link function which is not differentiable. Table 3 demonstrates that is reasonable for model (10) with right censoring data and nondifferential link function.

4.2. Application to Oscar Data

Redelmeier and Singh [14] compiled the Oscar Awards from 1929 to 2000 which list 766 nominees. We treat the data set as right-censored data. Because 327 died before the end of 2001, hence the censoring ratio is about . Han et al. [15] elaborated the description of the Oscar data. Whether winning an Oscar Award increases life expectancy is an interesting topic.

The number of total films in career, number of times the person was nominated for an Oscar (Num.), and whether the person won an Oscar (Id.) are chosen as covariates. We have intuitive thinking that there is a positive correlation between life expectancy and the number of total films in a career. To identify the parameter, we assume the coefficient of the number of total films in a career is 1. Random variable follows a standard exponential distribution with 500 replications, and we choose . Because the survival time of nominees belongs to censoring data, we exploit the parameter estimator for mean residual life transformation model (10) and for proportional mean residual life model (3) with censoring data.

From Table 4, the coefficients of whether the person won an Oscar for and are positive, and the confidence intervals of whether the person won an Oscar contain zero. So, we can deduce that a performer winning Oscar may not have a longer life span than those without winning.

5. Discussion

The mean residual life is often of interest and obviously important in many fields of application, such as industrial reliability study, actuarial research, and survival analysis. This paper presents a general class of mean residual life regression models, which contain the proportional mean residual life models and partially extends a class of mean residual life regression models studied by Sun and Zhao (2010). The link function is assumed to be unknown and increasing in its seconding argument, but it is permitted to be not differentiable. Traditional methods like the counting process approach cannot be used to estimate the unknown regression parameter . Under maximum rank correlation estimation, we propose the estimation procedures for model (10) with complete data and right censoring data.

In addition to right censoring data, length-biased data often appear in survival analysis. For example, in unemployment data surveys, short-term unemployment does not register with the unemployment center. In the modeling process, if the length-biased data is not corrected, the duration of unemployment will be overestimated. Therefore, one interesting aspect is the parameter estimation of model (10) with length-biased data. Second, it is possible to construct an estimator for the link function. We leave these possible extensions for future research.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (11601083 and U1805263) and the Program for Probability and Statistics: Theory and Application (IRTL1704), and Innovative Research Team in Science and Technology in Fujian Province University (IRTSTFJ).