A Berry-Esseen Type Bound in Kernel Density Estimation for Negatively Associated Censored Data

Wu, Qunying; Chen, Pingyan

doi:https://doi.org/10.1155/2013/541250

Journal of Applied Mathematics

On this page

Abstract Introduction Proofs Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2013 | Article ID 541250 | https://doi.org/10.1155/2013/541250

A Berry-Esseen Type Bound in Kernel Density Estimation for Negatively Associated Censored Data

Qunying Wu¹and Pingyan Chen²

Academic Editor: XianHua Tang

Received19 Feb 2013

Accepted11 Jul 2013

Published28 Aug 2013

Abstract

We discuss the kernel estimation of a density function based on censored data when the survival and the censoring times form the stationary negatively associated (NA) sequences. Under certain regularity conditions, the Berry-Esseen type bounds are derived for the kernel density estimator and the Kaplan-Meier kernel density estimator at a fixed point .

1. Introduction

Let be a sequence of the true survival times. The random variables (r.v.s.) are not assumed to be mutually independent; it is assumed, however, that they have a common unknown continuous marginal distribution function (d.f.) and density function . Let the r.v.s. be censored on the right by the censoring r.v.s. , so that one observes only , where here and in the sequel, and is the indicator random variable of the event . In this random censorship model, the censoring times , , are assumed to have the common d.f. ; they are also assumed to be independent of the r.v.s. . Following the convention in the survival literature, we assume that both and are nonnegative random variables. In contrast to statistics for complete data, we observe only the pairs , , and the estimators are based on these pairs.

The following nonparametric estimation of the distribution functions and due to Kaplan and Meier [1] is widely used to estimate and on the basis of the data : where denote the order statistics of and is the concomitant of .

We introduce the kernel density estimator where are bandwidths and is some kernel function. When is known, (3) can be used to estimate the common density of the lifetimes. However, in most practical cases is unknown and must be replaced by the Kaplan-Meier estimator , so the Kaplan-Meier kernel density estimator of the is defined by

There is an extensive literature on the Kaplan-Meier estimator for censored independent observations. We refer to papers by Földes and Rejtő [2], Gu and Lai [3], Gill [4], and Sun and Zhu [5]. Sun and Zhu obtained the following Berry-Esseen bound for i.i.d. censored sequences.

Theorem A. Let be a bounded probability kernel function with compact support satisfying for integer ,

Let be -order continuously differentiable and let be continuously differentiable in a neighborhood of with for . Then where denotes the standard normal distribution function, and .

However, the censored dependent data appear in a number of applications. For example, repeated measurements in survival analysis follow this pattern; see Kang and Koehler [6]. In the context of censored time series analysis, Shumway et al. [7] considered (hourly or daily) measurements of the concentration of a given substance subject to some detection limits, thus being potentially censored from the right. Lecoutre and Ould-Said [8], Cai [9], and Liang and Uña-Álvarez [10] studied the convergence for the stationary -mixing data. However, the convergence for the NA data has not been reported.

The main purpose of this paper is to study the kernel density estimator and the Kaplan-Meier kernel estimator of a density function based on censored data when the survival and the censoring times form the stationary NA (see the following definition) sequences. Under certain regularity conditions, the Berry-Esseen type bounds are derived for the kernel density estimator and the Kaplan-Meier kernel estimator at a fixed point .

Definition 1. Random variables , are said to be negatively associated (NA) if for every pair of disjoint subsets and of , where and are increasing for every variable (or decreasing for every variable) such that this covariance exists. A sequence of random variables is said to be NA if every finite subfamily is NA.

Obviously, if is a sequence of NA random variables and is a sequence of nondecreasing (or nonincreasing) functions, then is also a sequence of NA random variables.

This definition was introduced by Joag-Dev and Proschan [11]. Statistical test depends greatly on sampling. The random sampling without replacement from a finite population is NA but is not independent. NA sampling has wide applications such as those in multivariate statistical analysis and reliability theory. Because of the wide applications of NA sampling, the limit behavior of NA random variables has received more and more attention recently. One can refer to Joag-Dev and Proschan [11] for fundamental properties, Matuła [12] for the three-series theorem, and Wu and Jiang [13, 14] for the strong convergence.

2. Main Results

In what follows, let be the d.f. of the ’s, . Since the sequences and are independent, it follows that .

Define (possibly infinite) times , , and by Then, .

We give the following four lemmas, which are helpful in proving our theorems.

Lemma 2 (Chang and Rao, [15]). Let and be random variables, then for any here and in the sequel, where denotes the standard normal distribution function.

Lemma 3 (Su et al. [16, Theorem 1]). Let be a sequence of NA r.v.s. with zero means and , and . Then for , where depends only on .

Lemma 4. Let be a sequence of NA r.v.s. with continuous d.f. , and let be the empirical d.f. based on the segments . Then

Proof. Similar to the proof of Lemma 4 in Yang [17], we can prove Lemma 4.

Lemma 5 (Wu and Chen [18, Theorem 1.3]). Let and be two sequences of NA r.v.s. Suppose that the sequences and are independent. Then for any ,

In order to formulate our main results, we now list some assumptions.() and are two sequences of stationary NA random variables, and and are independent.() Suppose that , , and and have bounded derivative in a neighborhood of .() For all integers , the conditional distribution , given , has a density , and for all , for and some , where represents a neighborhood of . () The kernel is a bounded derivative function with for and .() Let , , and be positive integers with where .

Remark 6. () Implies and .
Let , .

Theorem 7. Suppose that are satisfied; then where , , , .
Consider the following: where .
Furthermore, if then

Theorem 8. Assume that the conditions of Theorem 7 hold. Then Furthermore, if (16) holds, then

3. Proofs

Proof of Theorem 7. We observe that, by (3),
Let , , , where and then By (20), We first estimate , , and . Obviously, implies that and are stationary; thus,
From , , and , we obtain Hence, by , .
For and , by ,
Therefore, by , By , , , and Lemma 2.3 of Zhang [19], for ,
Thus, by and , Therefore, by the combination of , (24), (26), (28), and (30),
Similarly,
By (26), (27), , , and ,
By (25), , and , Note that for any random variables and ; from (31)–(33), Therefore, from the combination of (23) and (31)–(34), it follows that Thus, (14) holds.
Now, we prove (15). Let , , , . Then, . According to Lemma 2, (14), (20), (32), and (33), we have
Let , be independent random variables with the same distribution as for . Put , . Obviously, Note that and from (20) and (24). By (14), (30), (32), and (33), Note that , , are independent random variables, and . Therefore, by (from (39)), (14), and Berry-Esseen inequality (cf. Petrov [20, page 154, Theorem 5.7]), there exists some constant such that
Similar to (26), we can get and . It is easy to see from Property P7 of Joag-Dev and Proschan [11] that is also sequence of NA r.v.s., so by using Lemma 3, we have
Assume that and are the characteristic functions of and , respectively. By Esseen inequality (cf. Petrov [20, page 146, Theorem 5.3]), for any , there exists some constant such that
By Theorem 10 in Newman [21], (14), and (30), Therefore, On applying (39)–(41), we have Thus, Choosing , then by (42)–(46), Therefore, the combination of (37)–(39), (41), (47), and (15) holds.
Finally, we prove (17). By Lemma 2 and (15), for any ,
Applying (14), , , and differential mean value theorem, there exists a constant , such that Hence, there exists a constant sufficiently large such that . Let in (48); then . Therefore, by (48), (16) holds.

Proof of Theorem 8. Using (15) and Lemma 2,
Let be the empirical d.f. of . Then, by (2),
Thus, by Lemmas 4 and 5, for ,
Using (14), we get
Therefore, (18) holds from (50) and (53).
Using (18), similar to the proof of (17), we can prove (19). This completes the proof of Theorem 8.

Acknowledgments

The authors are very grateful to the referees and the editors for their valuable comments and helpful suggestions that improved the clarity and readability of the paper. This paper is supported by the National Natural Science Foundation of china (11061012), project supported by Program to Sponsor Teams for Innovation in the Construction of Talent Highlands in Guangxi Institutions of Higher Learning ((2011) 47), and the Support Program of the Guangxi China Science Foundation (2012GXNSFAA053010 and 2013GXNSFDA019001).

References

E. L. Kaplan and P. Meier, “Nonparametric estimation from incomplete observations,” Journal of the American Statistical Association, vol. 53, pp. 457–481, 1958.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
A. Földes and L. Rejtő, “A LIL type result for the product limit estimator,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 56, no. 1, pp. 75–86, 1981.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. G. Gu and T. L. Lai, “Functional laws of the iterated logarithm for the product-limit estimator of a distribution function under random censorship or truncation,” The Annals of Probability, vol. 18, no. 1, pp. 160–189, 1990.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
R. D. Gill, Censoring and Stochastic Integrals, vol. 124 of Mathematical Centre Tracts, Mathematisch Centrum, Amsterdam, The Netherlands, 1980.
View at: MathSciNet
L. Q. Sun and L. X. Zhu, “A Berry-Esseen type bound for kernel density estimators under random censorship,” Acta Mathematica Sinica, vol. 42, no. 4, pp. 627–636, 1999.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
S.-S. Kang and K. J. Koehler, “Modification of the Greenwood formula for correlated response times,” Biometrics, vol. 53, no. 3, pp. 885–899, 1997.
View at: Publisher Site | Google Scholar
R. H. Shumway, A. S. Azari, and P. Johnson, “Estimating mean concentrations under transformation for environmental data with detection limits,” Technometrics, vol. 31, pp. 347–356, 1988.
View at: Google Scholar
J.-P. Lecoutre and E. Ould-Said, “Convergence of the conditional Kaplan-Meier estimate under strong mixing,” Journal of Statistical Planning and Inference, vol. 44, no. 3, pp. 359–369, 1995.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Z. Cai, “Estimating a distribution function for censored time series data,” Journal of Multivariate Analysis, vol. 78, no. 2, pp. 299–318, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
H.-Y. Liang and J. de Uña-Álvarez, “A Berry-Esseen type bound in kernel density estimation for strong mixing censored samples,” Journal of Multivariate Analysis, vol. 100, no. 6, pp. 1219–1231, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
K. Joag-Dev and F. Proschan, “Negative association of random variables, with applications,” The Annals of Statistics, vol. 11, no. 1, pp. 286–295, 1983.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
P. Matuła, “A note on the almost sure convergence of sums of negatively dependent random variables,” Statistics & Probability Letters, vol. 15, no. 3, pp. 209–213, 1992.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Q. Wu and Y. Jiang, “A law of the iterated logarithm of partial sums for NA random variables,” Journal of the Korean Statistical Society, vol. 39, no. 2, pp. 199–206, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
Q. Wu and Y. Jiang, “Chover's law of the iterated logarithm for negatively associated sequences,” Journal of Systems Science & Complexity, vol. 23, no. 2, pp. 293–302, 2010.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. N. Chang and P. V. Rao, “Berry-Esseen bound for the Kaplan-Meier estimator,” Communications in Statistics, vol. 18, no. 12, pp. 4647–4664, 1989.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
C. Su, L. Zhao, and Y. Wang, “Moment inequalities and weak convergence for negatively associated sequences,” Science in China A, vol. 40, no. 2, pp. 172–182, 1997.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
S. C. Yang, “Consistency of the nearest neighbor estimator of the density function for negatively associated samples,” Acta Mathematicae Applicatae Sinica, vol. 26, no. 3, pp. 385–394, 2003.
View at: Google Scholar | MathSciNet
Q. Y. Wu and P. Y. Chen, “Strong representation results of Kaplan-Meier estimator for censored NA data,” Journal of Inequalities and Applications, vol. 2013, article 340, 2013.
View at: Publisher Site | Google Scholar
L.-X. Zhang, “The weak convergence for functions of negatively associated random variables,” Journal of Multivariate Analysis, vol. 78, no. 2, pp. 272–298, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
V. V. Petrov, Limit Theorems of Probability Theory, vol. 4, Oxford University Press, New York, NY, USA, 1995.
View at: MathSciNet
C. M. Newman, “Asymptotic independence and limit theorems for positively and negatively dependent random variables,” in Inequalities in Statistics and Probability, vol. 5 of IMS Lecture Notes Monogr. Ser., pp. 127–140, Institute of Mathematical Statistics, Hayward, Calif, USA, 1984.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2013 Qunying Wu and Pingyan Chen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1056

Downloads

913

Citations