Abstract

We discuss the kernel estimation of a density function based on censored data when the survival and the censoring times form the stationary negatively associated (NA) sequences. Under certain regularity conditions, the Berry-Esseen type bounds are derived for the kernel density estimator and the Kaplan-Meier kernel density estimator at a fixed point .

1. Introduction

Let be a sequence of the true survival times. The random variables (r.v.s.) are not assumed to be mutually independent; it is assumed, however, that they have a common unknown continuous marginal distribution function (d.f.) and density function . Let the r.v.s. be censored on the right by the censoring r.v.s. , so that one observes only , where here and in the sequel, and is the indicator random variable of the event . In this random censorship model, the censoring times , , are assumed to have the common d.f. ; they are also assumed to be independent of the r.v.s. . Following the convention in the survival literature, we assume that both and are nonnegative random variables. In contrast to statistics for complete data, we observe only the pairs , , and the estimators are based on these pairs.

The following nonparametric estimation of the distribution functions and due to Kaplan and Meier [1] is widely used to estimate and on the basis of the data : where denote the order statistics of and is the concomitant of .

We introduce the kernel density estimator where are bandwidths and is some kernel function. When is known, (3) can be used to estimate the common density of the lifetimes. However, in most practical cases is unknown and must be replaced by the Kaplan-Meier estimator , so the Kaplan-Meier kernel density estimator of the is defined by

There is an extensive literature on the Kaplan-Meier estimator for censored independent observations. We refer to papers by Földes and Rejtő [2], Gu and Lai [3], Gill [4], and Sun and Zhu [5]. Sun and Zhu obtained the following Berry-Esseen bound for i.i.d. censored sequences.

Theorem A. Let be a bounded probability kernel function with compact support satisfying for integer ,

Let be -order continuously differentiable and let be continuously differentiable in a neighborhood of with for . Then where denotes the standard normal distribution function, and .

However, the censored dependent data appear in a number of applications. For example, repeated measurements in survival analysis follow this pattern; see Kang and Koehler [6]. In the context of censored time series analysis, Shumway et al. [7] considered (hourly or daily) measurements of the concentration of a given substance subject to some detection limits, thus being potentially censored from the right. Lecoutre and Ould-Said [8], Cai [9], and Liang and Uña-Álvarez [10] studied the convergence for the stationary -mixing data. However, the convergence for the NA data has not been reported.

The main purpose of this paper is to study the kernel density estimator and the Kaplan-Meier kernel estimator of a density function based on censored data when the survival and the censoring times form the stationary NA (see the following definition) sequences. Under certain regularity conditions, the Berry-Esseen type bounds are derived for the kernel density estimator and the Kaplan-Meier kernel estimator at a fixed point .

Definition 1. Random variables , are said to be negatively associated (NA) if for every pair of disjoint subsets and of , where and are increasing for every variable (or decreasing for every variable) such that this covariance exists. A sequence of random variables is said to be NA if every finite subfamily is NA.

Obviously, if is a sequence of NA random variables and is a sequence of nondecreasing (or nonincreasing) functions, then is also a sequence of NA random variables.

This definition was introduced by Joag-Dev and Proschan [11]. Statistical test depends greatly on sampling. The random sampling without replacement from a finite population is NA but is not independent. NA sampling has wide applications such as those in multivariate statistical analysis and reliability theory. Because of the wide applications of NA sampling, the limit behavior of NA random variables has received more and more attention recently. One can refer to Joag-Dev and Proschan [11] for fundamental properties, Matuła [12] for the three-series theorem, and Wu and Jiang [13, 14] for the strong convergence.

2. Main Results

In what follows, let be the d.f. of the ’s, . Since the sequences and are independent, it follows that .

Define (possibly infinite) times , , and by Then, .

We give the following four lemmas, which are helpful in proving our theorems.

Lemma 2 (Chang and Rao, [15]). Let and be random variables, then for any here and in the sequel, where denotes the standard normal distribution function.

Lemma 3 (Su et al. [16, Theorem 1]). Let be a sequence of NA r.v.s. with zero means and , and . Then for , where depends only on .

Lemma 4. Let be a sequence of NA r.v.s. with continuous d.f. , and let be the empirical d.f. based on the segments . Then

Proof. Similar to the proof of Lemma 4 in Yang [17], we can prove Lemma 4.

Lemma 5 (Wu and Chen [18, Theorem 1.3]). Let and be two sequences of NA r.v.s. Suppose that the sequences and are independent. Then for any ,

In order to formulate our main results, we now list some assumptions.() and are two sequences of stationary NA random variables, and and are independent.() Suppose that , , and and have bounded derivative in a neighborhood of .() For all integers , the conditional distribution , given , has a density , and for all , for and some , where represents a neighborhood of . () The kernel is a bounded derivative function with for and .() Let , , and be positive integers with where .

Remark 6. () Implies and .
Let , .

Theorem 7. Suppose that are satisfied; then where , , , .
Consider the following: where .
Furthermore, if then

Theorem 8. Assume that the conditions of Theorem 7 hold. Then Furthermore, if (16) holds, then

3. Proofs

Proof of Theorem 7. We observe that, by (3),
Let , , , where and then By (20), We first estimate , , and . Obviously, implies that and are stationary; thus,
From , , and , we obtain Hence, by , .
For and , by ,
Therefore, by , By , , , and Lemma 2.3 of Zhang [19], for ,
Thus, by and , Therefore, by the combination of , (24), (26), (28), and (30),
Similarly,
By (26), (27), , , and ,
By (25), , and , Note that for any random variables and ; from (31)–(33), Therefore, from the combination of (23) and (31)–(34), it follows that Thus, (14) holds.
Now, we prove (15). Let , , , . Then, . According to Lemma 2, (14), (20), (32), and (33), we have
Let , be independent random variables with the same distribution as for . Put , . Obviously, Note that and from (20) and (24). By (14), (30), (32), and (33), Note that , , are independent random variables, and . Therefore, by (from (39)), (14), and Berry-Esseen inequality (cf. Petrov [20, page 154, Theorem 5.7]), there exists some constant such that
Similar to (26), we can get and . It is easy to see from Property P7 of Joag-Dev and Proschan [11] that is also sequence of NA r.v.s., so by using Lemma 3, we have
Assume that and are the characteristic functions of and , respectively. By Esseen inequality (cf. Petrov [20, page 146, Theorem 5.3]), for any , there exists some constant such that
By Theorem 10 in Newman [21], (14), and (30), Therefore, On applying (39)–(41), we have Thus, Choosing , then by (42)–(46), Therefore, the combination of (37)–(39), (41), (47), and (15) holds.
Finally, we prove (17). By Lemma 2 and (15), for any ,
Applying (14), , , and differential mean value theorem, there exists a constant , such that Hence, there exists a constant sufficiently large such that . Let in (48); then . Therefore, by (48), (16) holds.

Proof of Theorem 8. Using (15) and Lemma 2,
Let be the empirical d.f. of . Then, by (2),
Thus, by Lemmas 4 and 5, for ,
Using (14), we get
Therefore, (18) holds from (50) and (53).
Using (18), similar to the proof of (17), we can prove (19). This completes the proof of Theorem 8.

Acknowledgments

The authors are very grateful to the referees and the editors for their valuable comments and helpful suggestions that improved the clarity and readability of the paper. This paper is supported by the National Natural Science Foundation of china (11061012), project supported by Program to Sponsor Teams for Innovation in the Construction of Talent Highlands in Guangxi Institutions of Higher Learning ((2011) 47), and the Support Program of the Guangxi China Science Foundation (2012GXNSFAA053010 and 2013GXNSFDA019001).