Research Article  Open Access
Abdelkader Mokkadem, Mariane Pelletier, Baba Thiam, "Joint Behaviour of Semirecursive Kernel Estimators of the Location and of the Size of the Mode of a Probability Density Function", Journal of Probability and Statistics, vol. 2011, Article ID 564297, 27 pages, 2011. https://doi.org/10.1155/2011/564297
Joint Behaviour of Semirecursive Kernel Estimators of the Location and of the Size of the Mode of a Probability Density Function
Abstract
Let and denote the location and the size of the mode of a probability density. We study the joint convergence rates of semirecursive kernel estimators of and . We show how the estimation of the size of the mode allows measuring the relevance of the estimation of its location. We also enlighten that, beyond their computational advantage on nonrecursive estimators, the semirecursive estimators are preferable to use for the construction of confidence regions.
1. Introduction
Let be independent and identically distributed valued random variables with unknown probability density . The aim of this paper is to study the joint kernel estimation of the location and of the size of the mode of . The mode is assumed to be unique, that is, for any , and nondegenerated, that is, the second order differential at the point is nonsingular (in the sequel, will denote the differential of order of a multivariate function ).
The problem of estimating the location of the mode of a probability density was widely studied. Kernel methods were considered, among many others, by Parzen [1], Nadaraya [2], Van Ryzin [3], Rüschendorf [4], Konakov [5], Samanta [6], Eddy ([7, 8]), Romano [9], Tsybakov [10], Vieu [11], Mokkadem and Pelletier [12], and Abraham et al. ([13, 14]). The problem of estimating the size of the mode was brought up by several authors (see, e.g., Romano [9] and Vieu [11]), but, at our knowledge, the behaviour of estimators of the size of the mode has not been investigated in detail, whereas there are at least two statistical motivations for estimating this parameter. First, the use of an estimator of the size is necessary for the construction of confidence regions for the location of the mode (see, e.g., Romano [9]). As a more important motivation, the estimation of the high of the peak gives information on the shape of a density in a neighbourhood of its mode and, consequently, allows measuring the pertinence of the parameter location of the mode; this motivation must be related to the remark made by Vieu [11], who pointed out that the location of the mode is more related to the shape of the derivative of , whereas the size of the mode is more related to the shape of the density itself.
Let us mention that, even though the problem of estimating the size of the mode was not investigated in the framework of density estimation, it was studied in the framework of regression estimation. Müller [15] proves in particular the joint asymptotic normality and independence of kernel estimators of the location and of the size of the mode in the framework of nonparametric regression models with fixed design. In the framework of nonparametric regression with random design, a similar result is obtained by Ziegler ([16, 17]) for kernel estimators and by Mokkadem and Pelletier [18] for estimators issued from stochastic approximation methods.
This paper is focused on semirecursive kernel estimators of and . To explain why we chose this option of semirecursive estimators, let us first recall that the (nonrecursive) wellknown kernel estimator of the location of the mode introduced by Parzen [1] is defined as a random variable satisfying where is Rosenblatt's estimator of ; more precisely, where the bandwidth is a sequence of positive real numbers going to zero and the kernel is a continuous function satisfying , . The asymptotic behaviour of was widely studied (see, among others, [1–9, 11, 12]), but, on a computational point of view, the estimator has a main drawback: its update, from a sample size to a sample size , is far from being immediate. Applying the stochastic approximation method, Tsybakov [10] introduced the recursive kernel estimator of defined as where is arbitrarily chosen and the stepsize is a sequence of positive real numbers going to zero. The great property of this estimator is that its update is very rapid. Unfortunately, for reasons inherent to stochastic approximation algorithms properties, very strong assumptions on the density must be required to ensure its consistency. A recursive version of Rosenblatt's density estimator was introduced by Wolverton and Wagner [19] (and discussed, among others, by Yamato [20], Davies [21], Devroye [22], Menon et al. [23], Wertz [24], Wegman and Davies [25], Roussas [26], and Mokkadem et al. [27]). Let us recall that is defined as Its update from a sample of size to one of size is immediate since clearly satisfies the recursive relation This property of rapid update of the density estimator is particularly important in the framework of mode estimation, since the number of points where must be estimated is very large. We thus define a semirecursive version of Parzen's estimator of the location of the mode by using WolvertonWagner's recursive density estimator, rather than Rosenblatt's density estimator. More precisely, our estimator of the location of the mode is a random variable satisfying
Let us now come back to the problem of estimating the size of the mode. The ordinarily used estimator is defined as ( being Rosenblatt's density estimator and Parzen's mode estimator); the consistency of is sufficient to allow the construction of confidence regions for (see, e.g., Romano [9]). Adapting the construction of to the semirecursive framework would lead us to estimate by However, this estimator has two main drawbacks (as well as ). First, the use of a higher order kernel is necessary for to satisfy a central limit theorem and thus for the construction of confidence intervals of (and of confidence regions for ). Moreover, in the case when a higher order kernel is used, it is not possible to choose a bandwidth for which both estimators and converge at the optimal rate. These observations lead us to use two different bandwidths, one for the estimation of , the other one for the estimation of . More precisely, let be the recursive kernel density estimator defined as where the bandwidth may be different from used in the definition of (see (1.4)); we estimate the size of the mode by where is still defined by (1.6) and thus with the first bandwidth .
The purpose of this paper is the study of the joint asymptotic behaviour of and . We first prove the strong consistency of both estimators. We then establish the joint weak convergence rate of and . We prove in particular that adequate choices of the bandwidths lead to the asymptotic normality and independence of these estimators and that the use of different bandwidths allow obtaining simultaneously the optimal convergence rate of both estimators. We then apply our weak convergence rate result to the construction of confidence regions for and illustrate this application with a simulations study. This application enlightens the advantage of using semirecursive estimators rather than nonrecursive estimators. It also shows how the estimation of the size of the mode gives information on the relevance of estimating its location. Finally, we establish the joint strong convergence rate of and .
2. Assumptions and Main Results
Throughout this paper, and are defined as and for all , where and are two positive functions.
2.1. Strong Consistency
The conditions we require for the strong consistency of and are the following. (A1)(i) is an integrable, differentiable, and even function such that .There exists such that . is Hölder continuous.There exists such that is a bounded function. (A2)(i) is uniformly continuous on .There exists such that .There exists such that is a bounded function.There exists such that for all . (A3)(i)The function is locally bounded and varies regularly with exponent with . The function is locally bounded and varies regularly with exponent with .
Remark 2.1. Note that (A1)(iv) implies that is bounded.
Remark 2.2. Let us recall that a positive function (not necessarily monotone) defined on is slowly varying if and that a function varies regularly with exponent , , if and only if it is of the form with slowly varying (see, e.g., Feller [28, page 275]). Typical examples of regularly varying functions are , , , , and so on.
Proposition 2.3. Let and be defined by (1.6) and (1.9), respectively. (i)Under (A1), (A2), and (A3)(i), a.s. (ii)Under (A1)–(A3), a.s.
Let us mention that the assumptions required on the probability density to establish the strong consistency of the semirecursive estimator of the location of the mode are slightly stronger than those needed for the nonrecursive estimator (see, e.g., [9, 12]), but are much weaker than those needed for the recursive estimator (see [10]). Let us also note that the strong consistency of can be proved in the same way as that of .
2.2. Weak Convergence Rate
In order to state the weak convergence rate of and , we need the following additional assumptions on , , , and . (A4)(i) is twice differentiable on . is integrable.For any , is bounded integrable and Hölder continuous. is a kernel of order , that is, , , and .(A5)(i) is nonsingular. is times differentiable; and are bounded.For any , , and for any , . (A6)(i)..
Remark 2.4. Note that (A4)(ii) and (A4)(iii) imply that is Lipschitzcontinuous and integrable; it is thus straightforward to see that (and in particular is bounded).
Let be the matrix defined by and set The following theorem gives the weak convergence rate of the semirecursive kernel mode estimator .
Theorem 2.5 (Weak convergence rate of ). Let be defined by (1.6), and assume that (A1), (A2), (A3)(i), (A4), (A5), and (A6)(i) hold. (1)If there exists such that , then (2)If , then
Remark 2.6. If or if , then , and thus is usually nonzero.
In order to compare the semirecursive estimator with the wellknown nonrecursive Parzen's estimator , let us recall that Theorem 2.5 holds when is replaced by and by zero (see e.g., Parzen [1] in the case , and Mokkadem and Pelletier [12] in the case ). The main advantage of on is that, due to the factor standing in the definition of , the asymptotic covariance of is smaller than that of ; this property will be discussed again in Section 2.3.
In order to state the weak convergence rate of , we need the following notation (where and where is a positive sequence):
Theorem 2.7 (Weak convergence rate of ). Let be defined by (1.9), and assume that (A1)–(A6) hold. (1)If there exists such that and if there exists such that , then (2)If and if there exists such that , then
Remark 2.8. If or if , then , so that is usually nonzero.
Remark 2.9. Following the proof of Theorem 2.7, it can be shown that the results of Theorem 2.7 also hold when is replaced by and by zero.
Let us consider the case when , that is, the case when the same bandwidth is used to define and (or and ). If is a twoorder kernel (i.e., if ), then the condition implies that for all , so that the condition required in Part 1 of Theorem 2.7 is not satisfied: the limit of (or of ) suitably normalized is then necessary degenerated. This is the first main drawback of using the same bandwidth to estimate the location and the size of the mode: to construct confidence intervals for , the use of higherorder kernels is unavoidable.
The estimation of the size of the mode is of course not independent of the estimation of the location, since the estimator is constructed with the help of the estimator . To get a good estimation of the size of the mode, it seems obvious that should be computed with a bandwidth leading to its optimal convergence rate (or, at least, to a convergence rate close to the optimal one). The main information given by Theorem 2.7 is that, for to converge at the optimal rate, the use of a second bandwidth is then necessary.
Now, set and, for any , where is the identity matrix. The following corollary gives a central limit theorem for the couple .
Corollary 2.10 (Joint weak convergence rate of and ). Let and be defined by (1.6) and (1.9), respectively, and let the assumptions of the first parts of Theorems 2.5 and 2.7 hold. Then,
Remark 2.11. Following the proof of Corollary 2.10, it can be shown that the result of Corollary 2.10 also holds when is replaced by , by , and and by zero.
Let us enlighten that, in view of Corollary 2.10, in the case when the couple satisfies a central limit theorem, the estimators and are asymptotically independent, although, in its definition, the estimator of the size of the mode is heavily connected to that of the location of the mode. This property is not quite surprising since, as pointed out by Vieu [11], the location of the mode gives information on the shape of the density derivative, whereas the size of the mode gives information on the shape of the density itself. This constatation must be related to the fact that the weak (and strong) convergence rate of is given by that of the gradient of , whereas the weak (and strong) convergence rate of is given by that of itself; the variance of the density estimators converging to zero faster than that of the estimators of the density derivatives, the asymptotic independence of and is completely explained.
Let us finally say one word on our assumptions on the bandwidths. In the framework of nonrecursive estimation, there is no need to assume that and are regularly varying sequences. In the case of semirecursive estimation, this assumption cannot obviously be omitted, since the exponents and stand in the expressions of the asymptotic bias and variance . This might be seen as a slight inconvenient of semirecursive estimation; however, as it is enlightened in the following section, it turns out to be an advantage, since the asymptotic variances of the semirecursive estimators are smaller than those of the nonrecursive estimators.
2.3. Construction of Confidence Regions and Simulations Studies
The application of Theorems 2.5, 2.7, and Corollary 2.10 allows the construction of confidence regions of the location and of the size of the mode, as well as confidence ellipsoids of the couple . Hall [29] shows that, in order to construct confidence regions, avoiding bias estimation by a slight undersmoothing is more efficient than explicit bias correction. In the framework of undersmoothing, the asymptotic bias of the estimator is negligible in front of its asymptotic variance; according to the estimation by confidence regions point of view, the parameter to minimize is thus the asymptotic variance. Now, set and note that, in view of Corollary 2.10 and of Remark 2.11, (resp. ) is the asymptotic covariance matrix of the semirecursive estimators (resp., of the nonrecursive estimators ). In order to construct confidence regions for the location and/or size of the mode, it is thus much preferable to use semirecursive estimators rather than nonrecursive estimators. Simulations studies confirm this theoretical conclusion, whatever the parameter (, or ) for which confidence regions are constructed is. For sake of succinctness, we do not give all these simulations results here but focus on the construction of confidence ellipsoid for ; the aim of this example is of course to enlighten the advantage of using semirecursive estimators rather than nonrecursive estimators but also to show how this confidence region gives information on the shape of the density and consequently allows measuring the pertinence of the parameter location of the mode.
To construct confidence regions for , we consider the case . The following corollary is a straightforward consequence of Corollary 2.10.
Corollary 2.12. Let and be defined by (1.6) and (1.9), respectively, and let the assumptions of the first parts of Theorems 2.5 and 2.7 hold. We then have Moreover, (2.10) still holds when the parameters and are replaced by consistent estimators.
Remark 2.13. In view of Remark 2.11, in the case when the nonrecursive estimators and are used, (2.10) becomes (and, again, this convergence still holds when the parameters and are replaced by consistent estimators).
Let (resp., ) be the recursive estimator (resp., the nonrecursive Rosenblatt's estimator) of computed with the help of a bandwidth , and set Moreover, let be such that , where is distributed; in view of Corollary 2.12 and Remark 2.13, the sets are confidence ellipsoids for with asymptotic coverage level . Let us dwell on the fact that both confidence regions have the same asymptotic level, but the lengths of the axes of the first one (constructed with the help of the semirecursive estimators and ) are smaller than those of the second one (constructed with the help of the nonrecursive estimators and ).
We now present simulations results. In order to see the relationship between the shape of the confidence ellipsoids and that of the density, the density we consider is the density of the distribution, the parameter taking the values 0.3, 0.4, 0.5, 0.7, 0.75, 1, 1.5, 2, and 2.5. We use the sample size and the coverage level (and thus ). In each case, the number of simulations is . The kernel we use is the standard Gaussian density; the bandwidths are Table 1 gives, for each value of , the empirical values of , , , (with respect to the 5000 simulations) and

: the empirical length of the axis of the confidence ellipsoid ;
: the empirical length of the axis of the confidence ellipsoid ;
: the empirical length of the axis of the confidence ellipsoid ;
: the empirical length of the axis of the confidence ellipsoid ;
: the empirical coverage level of the confidence ellipsoid ;
: the empirical coverage level of the confidence ellipsoid .
Confirming our theoretical results, we see that the empirical coverage levels of both confidence ellipsoids and are similar but that the empirical areas of the ellipsoids (constructed with the help of the semirecursive estimators) are always smaller than those of the ellipsoids (constructed with the help of the nonrecursive estimators).
Let us now discuss the interest of the estimation of the size of the mode and that of the joint estimation of the location and size of the mode. Both estimations give information on the shape of the probability density and, consequently, allow measuring the pertinence of the parameter location of the mode. Of course, the parameter is significant only in the case when the high of the peak is large enough; since we consider here the example of the distribution, this corresponds to the case when is small enough. Estimating only the size of the mode gives a first idea of the shape of the density around the location of the mode (for instance, when the size is estimated around 0.16, it is clear that the density is very flat). Now, the shape of the confidence ellipsoids allows getting a more precise idea. As a matter of fact, for small values of , the length of the axis is larger than that of the axis; as increases, the length of the axis decreases, and the one of the axis increases (for , the length of the axis is larger than 20 times the one of the axis). Let us underline that these variations of the lengths of the axes are not due to bad estimations results; Table 2 gives the values of the lengths (resp., ) of the axis, (resp., ) of the axis of the ellipsoids computed with the semirecursive estimators and (resp., with the nonrecursive estimators and ) in the case when the true values of the parameters and are used (that is, by straightforwardly applying (2.10) and (2.11)).

2.4. Strong Convergence Rate
To establish the strong convergence rate of and , we need the following additional assumption. is differentiable, and its derivative varies regularly with exponent . is differentiable, and its derivative varies regularly with exponent .
The following two theorems give the almost sure convergence rate of and of , respectively. Before stating them, let us enlighten that Proposition 2.3 in Mokkadem and Pelletier [12] ensures that the matrix (and thus the matrix ) is nonsingular.
Theorem 2.14 (Strong convergence rate of ). Let be defined by (1.6), and assume that (A1), (A2), (A3)(i), (A4), (A5), (A6)(i), and (A7)(i) hold. (1)If there exists such that , then, with probability one, the sequence is relatively compact and its limit set is the ellipsoid (2)If , then, with probability one, .
Theorem 2.15 (Strong convergence rate of ). Let be defined by (1.9), and assume that (A1)–(A7) hold. (1)If there exists such that and if there exists such that , then, with probability one, the sequence is relatively compact, and its limit set is the interval . (2)If and if there exists such that , then, with probability one, .
To establish a law of the iterated logarithm for the couple , we need the following additional assumption.(A8)There exists such that
Remark 2.16. Assumption (A8) holds when . In the case when , set and ; (A8) is then satisfied when for large enough.
Corollary 2.17 (Joint strong convergence rate of and ). Let and be defined by (1.6) and (1.9), respectively; let the assumptions of Parts 1 of Theorems 2.14 and 2.15 hold, as well as (A8). Then, with probability one, the sequence is relatively compact, and its limit set is the ellipsoid
Laws of the iterated logarithm for Parzen's nonrecursive kernel mode estimator in the multivariate framework were established by Mokkadem and Pelletier [12]. The technics of demonstration used in the framework of nonrecursive estimators are totally different from those employed to prove Theorem 2.14. This is due to the following fundamental difference between the nonrecursive estimator and the semirecursive estimator : the study of the asymptotic behaviour of comes down to the one of a triangular sum of independent variables, whereas the study of the asymptotic behaviour of reduces to the one of a sum of independent variables. Of course, this difference is not quite important for the study of the weak convergence rate. But, for the study of the strong convergence rate, it makes the case of the semirecursive estimation much easier than the case of the nonrecursive estimation. In particular, on the opposite to the weak convergence rate, the joint strong convergence rate of the nonrecursive estimators and cannot be obtained by following the lines of the proof of Theorem 2.14 and remains an open question.
3. Proofs
Let us first note that an important consequence of (A3)(i) is that Moreover, for all , As a matter of fact: (i) if , (3.2) follows easily from (3.1); (ii) if , since is summable, (3.2) holds; (iii) if , then , so that , and thus (3.2) follows.
Of course, in view of (A3)(ii), (3.1) and (3.2) also hold when and are replaced by and , respectively.
Our proofs are now organized as follows. Section 3.1 is devoted to the proof of Proposition 2.3. In Section 3.2, we state some preliminary lemmas, which are crucial in the proof of the convergence rates of and and which are proved in Section 3.6. Section 3.3 is reserved to the proof of Theorems 2.5 and 2.14, Section 3.4 to that of Theorems 2.7 and 2.15, and Section 3.5 to that of Corollaries 2.10 and 2.17.
3.1. Consistency of and : Proof of Proposition 2.3
Since is the mode of and the mode of , we have The application of Theorem 5 in Mokkadem et al. [27] with and ensures that, for any , there exists such that . In view of (3.1), since , we can write BorellCantelli's Lemma then ensures that a.s. Since , it follows from (3.3) that a.s. Since is continuous, since , and since is the unique mode of , we deduce that a.s. Now, we have where the last inequality follows from (3.3). Following the proof of the strong uniform convergence of , we show that a.s. It follows that a.s., which concludes the proof of Proposition 2.3.
3.2. Some Preliminary Lemmas
The aim of this section is to state some properties of the density estimators and of their derivatives, which are crucial in the proof of Theorems 2.5–2.15 and of Corollaries 2.10 and 2.17.
3.2.1. Strong Uniform Convergence Rate of the Derivatives of the Density Estimators
For any uplet , set and, for any function , let denote the th partial derivative of (if , then ). In order to prove Theorems 2.5–2.15 and Corollaries 2.10 and 2.17, we need to know the behaviour of for and that of for . For the sake of conciseness, we state the preliminary lemmas 3.1 and 3.2 for where either and or and . Moreover, we set if , and if .
Lemma 3.1. Let (A1), (A2), (A3)(i), (A4), (A5), and (A6)(i) hold. Moreover, if , then let (A3)(ii) and (A6)(ii) hold. We have where is defined in (2.1). Moreover, if we set , then
Lemma 3.2. Let be a compact set of , and assume that (A1), (A2), (A3)(i), (A4), (A5), and (A6)(i) hold. Moreover, if , then let (A3)(ii) and (A6)(ii) hold. Then, for all , we have
Lemma 3.1 is proved in Mokkadem et al. [27], Lemma 3.2 in Section 3.6.
3.2.2. Convergence Rate of
Lemma 3.3. Under Assumptions (A1), (A2), (A3)(i), (A4), (A5), and (A6)(i), we have Under Assumptions (A1)–(A6), we have
Lemma 3.4. Under Assumptions (A1), (A2), (A3)(i), (A4), (A5), (A6)(i), and (A7)(i), with probability one, the sequence is relatively compact, and its limit set is . Under Assumptions (A1)–(A7), with probability one, the sequence is relatively compact, and its limit set is the interval .Under Assumptions (A1)–(A8), with probability one, the sequence is relatively compact, and its limit set is .
3.3. Convergence Rate of : Proof of Theorems 2.5 and 2.14
In order to prove Theorems 2.5 and 2.14, we first show that the weak and strong asymptotic behaviours of are given by those of (see Section 3.3.1) and then deduce the convergence rates of from those of (see Section 3.3.2).
3.3.1. Relationship between and
By definition of , we have , so that For each , a Taylor expansion applied to the real valued application implies the existence of such that Define the matrix by setting ; (3.12) can then be rewritten as Now, let be a compact set of containing . The combination of Lemmas 3.1 and 3.2 with , , and ensures that, for any and any ,
Since is continuous in a neighbourhood of and since a.s., (3.15) ensures that In view of (3.14), we can thus state the following lemma.
Lemma 3.5. Under Assumptions (A1), (A2), (A3)(i), (A4), (A5), and (A6)(i), the weak and strong asymptotic behaviour of is given by that of .
3.3.2. Proof of Theorems 2.5 and 2.14
Theorem 2.5 (resp., Theorem 2.14) straightforwardly follows from the application of Lemma 3.5 of the first part of Lemma 3.3 (resp., of the first part of Lemma 3.4) and of the following lemma.
Lemma 3.6. Let (A1), (A2), (A3)(i), (A4), (A5), and (A6)(i) hold. (1)If , then . (2)If , then .
Let us now prove Lemma 3.6. The application of Lemma 3.1 (with and ) ensures that Let us first consider the case when . By application of (3.1), we have (i)If , then , and thus ; Part 1 of Lemma 3.6 thus follows. (ii)If , then , and Part 2 of Lemma 3.6 also follows.
It remains to prove Part 2 of Lemma 3.6 in the case when . The application of (3.2) then ensures that, for all , In view of (3.17), Part 2 of Lemma 3.6 then follows.
3.4. Convergence Rate of : Proof of Theorems 2.7 and 2.15
In order to prove Theorems 2.7 and 2.15, we show that the weak and strong convergence rates of are given by those of . More precisely, set and assume that the following two lemmas hold.
Lemma 3.7. Let (A1)–(A6) hold. (1)If , then . (2)If , then .
Lemma 3.8. Let (A1)–(A6) hold. For all , a.s.
In order to prove Theorem 2.7, we first note that the application of the second part of Lemma 3.3 yields In view of Lemma 3.7, it follows that if , then and if , then Part 1 (resp., Part 2) of Theorem 2.7 straightforwardly follows from the combination of (3.22) (resp., of (3.23)) and of Lemma 3.8.
The proof of Theorem 2.15 follows that of Theorem 2.7 (except that the second part of Lemma 3.4 stands instead of (3.21)); this proof is thus omitted.
It remains to prove Lemmas 3.7 and 3.8. The proof of Lemma 3.7 follows that of Lemma 3.6 and is omitted. We now prove Lemma 3.8. We first note that ; a Taylor's expansion implies the existence of such that and Let be a compact set that contains ; for large enough, we get To get an upper bound of , we thus need to establish upper bounds of and of .(i)Let us recall that the a.s. convergence rate of is given by that of (see Lemma 3.5). Theorem 2.14 can be applied to obtain the exact a.s. convergence rate of . However, to avoid assuming (A7)(i), we apply here Lemmas 3.1 and 3.2 (with and and get the following upper bound of the a.s. convergence rate of and thus of : for any and ,