Abstract

A general notion of bootstrapped ϕ-divergence estimates constructed by exchangeably weighting sample is introduced. Asymptotic properties of these generalized bootstrapped ϕ-divergence estimates are obtained, by means of the empirical process theory, which are applied to construct the bootstrap confidence set with asymptotically correct coverage probability. Some of practical problems are discussed, including, in particular, the choice of escort parameter, and several examples of divergences are investigated. Simulation results are provided to illustrate the finite sample performance of the proposed estimators.

1. Introduction

The 𝜙-divergence modeling has proved to be a flexible tool and provided a powerful statistical modeling framework in a variety of applied and theoretical contexts (refer to [14] and the references therein). For good recent sources of references to the research literature in this area along with statistical applications, consult [2, 5]. Unfortunately, in general, the limiting distribution of the estimators, or their functionals, based on 𝜙-divergences depends crucially on the unknown distribution, which is a serious problem in practice. To circumvent this matter, we will propose, in this work, a general bootstrap of 𝜙-divergence-based estimators and study some of its properties by means of sophisticated empirical process techniques. A major application for an estimator is in the calculation of confidence intervals. By far the most favored confidence interval is the standard confidence interval based on a normal or a Student’s 𝑡-distribution. Such standard intervals are useful tools, but they are based on an approximation that can be quite inaccurate in practice. Bootstrap procedures are an attractive alternative. One way to look at them is as procedures for handling data when one is not willing to make assumptions about the parameters of the populations from which one sampled. The most that one is willing to assume is that the data are a reasonable representation of the population from which they come. One then resamples from the data and draws inferences about the corresponding population and its parameters. The resulting confidence intervals have received the most theoretical study of any topic in the bootstrap analysis.

Our main findings, which are analogous to that of Cheng and Huang [6], are summarized as follows. The 𝜙-divergence estimator 𝜶𝜙(𝜽) and the bootstrap 𝜙-divergence estimator 𝜶𝜙(𝜽) are obtained by optimizing the objective function (𝜽,𝜶) based on the independent and identically distributed (i.i.d) observations 𝐗1,,𝐗𝑛 and the bootstrap sample 𝐗1,,𝐗𝑛, respectively,𝜶𝜙(𝜽)=argsup𝜶𝚯1𝑛𝑛𝑖=1𝜽,𝜶,𝐗𝑖,𝜶𝜙(𝜽)=argsup𝜶𝚯1𝑛𝑛𝑖=1𝜽,𝜶,𝐗𝑖,(1.1) where 𝐗1,,𝐗𝑛 are independent draws with replacement from the original sample. We will mention that 𝜶𝜙(𝜽) can alternatively be expressed as𝜶𝜙(𝜽)=argsup𝜶𝚯1𝑛𝑛𝑖=1𝑊𝑛𝑖𝜽,𝜶,𝐗𝑖,(1.2) where the bootstrap weights are given by𝑊𝑛1,,𝑊𝑛𝑛Multinomial𝑛;𝑛1,,𝑛1.(1.3)

In this paper, we will consider the more general exchangeable bootstrap weighting scheme that includes Efron’s bootstrap [7, 8]. The general resampling scheme was first proposed in [9] and extensively studied by Bickel and Freedman [10], who suggested the name “weighted bootstrap”; for example, Bayesian Bootstrap when (𝑊𝑛1,,𝑊𝑛𝑛)=(𝐷𝑛1,,𝐷𝑛𝑛) is equal in distribution to the vector of 𝑛 spacings of 𝑛1 ordered uniform (0,1) random variables, that is,𝐷𝑛1,,𝐷𝑛𝑛Dirichlet(𝑛;1,,1).(1.4)

The interested reader may refer to [11]. The case𝐷𝑛1,,𝐷𝑛𝑛Dirichlet(𝑛;4,,4)(1.5)

was considered in [12, Remark 2.3] and [13, Remark 5]. The Bickel and Freedman result concerning the empirical process has been subsequently generalized for empirical processes based on observations in 𝑑, 𝑑>1 as well as in very general sample spaces and for various set and function-indexed random objects (see, e.g., [1418]). In this framework, [19] developed similar results for a variety of other statistical functions. This line of research was continued in the work of [20, 21]. There is a huge literature on the application of the bootstrap methodology to nonparametric kernel density and regression estimation, among other statistical procedures, and it is not the purpose of this paper to survey this extensive literature. This being said, it is worthwhile mentioning that the bootstrap as per Efron’s original formulation (see [7]) presents some drawbacks. Namely, some observations may be used more than once while others are not sampled at all. To overcome this difficulty, a more general formulation of the bootstrap has been devised: the weighted (or smooth) bootstrap, which has also been shown to be computationally more efficient in several applications. We may refer to [2224]. Holmes and Reinert [25] provided new proofs for many known results about the convergence in law of the bootstrap distribution to the true distribution of smooth statistics employing the techniques based on Stein’s method for empirical processes. Note that other variations of Efron’s bootstrap are studied in [26] using the term “generalized bootstrap.” The practical usefulness of the more general scheme is well documented in the literature. For a survey of further results on weighted bootstrap, the reader is referred to [27].

The remainder of this paper is organized as follows. In the forthcoming section we recall the estimation procedure based on 𝜙-divergences. The bootstrap of 𝜙-divergence estimators is introduced, in detail, and their asymptotic properties are given in Section 3. In Section 4, we provide some examples explaining the computation of the 𝜙-divergence estimators. In Section 5, we illustrate how to apply our results in the context of right censoring. Section 6 provides simulation results in order to illustrate the performance of the proposed estimators. To avoid interrupting the flow of the presentation, all mathematical developments are relegated to the appendix.

2. Dual Divergence-Based Estimates

The class of dual divergence estimators has been recently introduced by Keziou [28] and Broniatowski and Keziou [1]. Recall that the 𝜙-divergence between a bounded signed measure and a probability measure on 𝒟, when is absolutely continuous with respect to , is defined by𝐷𝜙(,)=𝒟𝜙ddd,(2.1)

where 𝜙() is a convex function from ],[ to [0,] with 𝜙(1)=0. We will consider only 𝜙-divergences for which the function 𝜙() is strictly convex and satisfies the domain of 𝜙(), dom𝜙={𝑥𝜙(𝑥)<} is an interval with end points𝑎𝜙<1<𝑏𝜙𝑎,𝜙𝜙=lim𝑥𝑎𝜙𝜙𝑎(𝑥),𝜙𝜙=lim𝑥𝑏𝜙𝜙(𝑥).(2.2)

The Kullback-Leibler, modified Kullback-Leibler, 𝜒2, modified 𝜒2, and Hellinger divergences are examples of 𝜙-divergences; they are obtained, respectively, for 𝜙(𝑥)=𝑥log𝑥𝑥+1, 𝜙(𝑥)=log𝑥+𝑥1, 𝜙(𝑥)=(1/2)(𝑥1)2, 𝜙(𝑥)=(1/2)((𝑥1)2/𝑥), and 𝜙(𝑥)=2(𝑥1)2. The squared Le Cam distance (sometimes called the Vincze-Le Cam distance) and 1-error are obtained, respectively, for𝜙(𝑥)=(𝑥1)22||||.(𝑥1),𝜙(𝑥)=𝑥1(2.3)

We extend the definition of these divergences on the whole space of all bounded signed measures via the extension of the definition of the corresponding 𝜙() functions on the whole real space as follows: when 𝜙() is not well defined on or well defined but not convex on , we set 𝜙(𝑥)=+ for all 𝑥<0. Notice that, for the 𝜒2-divergence, the corresponding 𝜙() function is defined on whole and strictly convex. All the above examples are particular cases of the so-called “power divergences,” introduced by Cressie and Read [29] (see also [4, Chapter 2], and also Rényi’s paper [30] is to be mentioned here), which are defined through the class of convex real-valued functions, for 𝛾 in {0,1},𝑥+𝜙𝛾𝑥(𝑥)=𝛾𝛾𝑥+𝛾1𝛾,(𝛾1)(2.4)𝜙0(𝑥)=log𝑥+𝑥1, and 𝜙1(𝑥)=𝑥log𝑥𝑥+1. (For all 𝛾, we define 𝜙𝛾(0)=lim𝑥0𝜙𝛾(𝑥).) So, the KL-divergence is associated to 𝜙1, the KL𝑚 to 𝜙0, the 𝜒2 to 𝜙2, the 𝜒2𝑚 to 𝜙1, and the Hellinger distance to 𝜙1/2. In the monograph by [4], the reader may find detailed ingredients of the modeling theory as well as surveys of the commonly used divergences.

Let {𝜽𝜽Θ} be some identifiable parametric model with Θ a compact subset of 𝑑. Consider the problem of estimation of the unknown true value of the parameter 𝜽0 on the basis of an i.i.d sample 𝐗1,,𝐗𝑛. We will assume that the observed data are from the probability space (𝒳,𝒜,𝜽0). Let 𝜙() be a function of class 𝒞2, strictly convex such that||||𝜙d𝜽(𝐱)d𝜶||||(𝐱)d𝜽(𝐱)<,𝜶𝚯.(2.5) As it is mentioned in [1], if the function 𝜙() satisfies the following conditions:[],thereexists0<𝛿<1suchthatforall𝑐in1𝛿,1+𝛿wecanndnumbers𝑐1,𝑐2,𝑐3suchthat𝜙(𝑐𝑥)𝑐1𝜙(𝑥)+𝑐2|𝑥|+𝑐3,real𝑥,(2.6) then the assumption (2.5) is satisfied whenever 𝐷𝜙(𝜽,𝜶)<, where 𝐷𝜙(𝜽,𝜶) stands for the 𝜙-divergence between 𝜽 and 𝜶; refer to [31, Lemma 3.2]. Also the real convex functions 𝜙() (2.4), associated with the class of power divergences, all satisfy the condition (2.5), including all standard divergences. Under assumption (2.5), using Fenchel duality technique, the divergence 𝐷𝜙(𝜽,𝜽0) can be represented as resulting from an optimization procedure, this result was elegantly proved in [1, 3, 28]. Broniatowski and Keziou [31] called it the dual form of a divergence, due to its connection with convex analysis. According to [3], under the strict convexity and the differentiability of the function 𝜙(), it holds𝜙(𝑡)𝜙(𝑠)+𝜙(𝑠)(𝑡𝑠),(2.7) where the equality holds only for 𝑠=𝑡. Let 𝜽 and 𝜽0 be fixed, and put 𝑡=d𝜽(𝐱)/d𝜽0(𝐱) and 𝑠=d𝜽(𝐱)/d𝜶(𝐱) in (2.7), and then integrate with respect to 𝜽0, to obtain𝐷𝜙𝜽,𝜽0𝜙=d𝜽d𝜽0d𝜽0=sup𝜶𝚯(𝜽,𝜶)d𝜽0,(2.8) where (𝜽,𝜶,)𝐱(𝜽,𝜶,𝐱) and𝜙(𝜽,𝜶,𝐱)=d𝜽d𝜶d𝜽d𝜽(𝐱)d𝜶𝜙(𝐱)d𝜽(𝐱)d𝜶(𝐱)𝜙d𝜽(𝐱)d𝜶(𝐱).(2.9) Furthermore, the supremum in this display (2.8) is unique and reached in 𝜶=𝜽0, independently upon the value of 𝜽. Naturally, a class of estimators of 𝜽0, called “dual 𝜙-divergence estimators” (D𝜙DEs), is defined by𝜶𝜙(𝜽)=argsup𝜶𝚯𝑛(𝜽,𝜶),𝜽𝚯,(2.10) where (𝜽,𝜶) is the function defined in (2.9) and, for a measurable function 𝑓(),𝑛𝑓=𝑛𝑛1𝑖=1𝑓𝐗𝑖.(2.11)

The class of estimators 𝜶𝜙(𝜽) satisfies𝑛𝜕𝜶𝜕𝜶𝜽,𝜙(𝜽)=0.(2.12) Formula (2.10) defines a family of 𝑀-estimators indexed by the function 𝜙() specifying the divergence and by some instrumental value of the parameter 𝜽. The 𝜙-divergence estimators are motivated by the fact that a suitable choice of the divergence may lead to an estimate more robust than the maximum likelihood estimator (MLE) one; see [32]. Toma and Broniatowski [33] studied the robustness of the D𝜙DEs through the influence function approach; they treated numerous examples of location-scale models and give sufficient conditions for the robustness of D𝜙DEs. We recall that the maximum likelihood estimate belongs to the class of estimates (2.10). Indeed, it is obtained when 𝜙(𝑥)=log𝑥+𝑥1, that is, as the dual modified KL𝑚-divergence estimate. Observe that 𝜙(𝑥)=(1/𝑥)+1 and 𝑥𝜙(𝑥)𝜙(𝑥)=log𝑥, and hence(𝜽,𝜶)d𝑛=logd𝜽d𝜶d𝑛.(2.13)

Keeping in mind definitions (2.10), we get𝜶KL𝑚(𝜽)=argsup𝜶logd𝜽d𝜶d𝑛=argsup𝜶logd𝜶d𝑛=MLE,(2.14)

independently upon 𝜽.

3. Asymptotic Properties

In this section, we will establish the consistency of bootstrapping under general conditions in the framework of dual divergence estimation. Define, for a measurable function 𝑓(),𝑛1𝑓=𝑛𝑛𝑖=1𝑊𝑛𝑖𝑓𝐗𝑖,(3.1)

where 𝑊𝑛𝑖’s are the bootstrap weights defined on the probability space (𝒲,Ω,𝑊). In view of (2.10), the bootstrap estimator can be rewritten as𝜶𝜙(𝜽)=argsup𝜶𝚯𝑛(𝜽,𝜶).(3.2) The definition of 𝜶𝜙(𝜽), defined in (3.2), implies that𝑛𝜕𝜶𝜕𝜶𝜽,𝜙(𝜽)=0.(3.3) The bootstrap weights 𝑊𝑛𝑖’s are assumed to belong to the class of exchangeable bootstrap weights introduced in [23]. In the sequel, the transpose of a vector 𝐱 will be denoted by 𝐱. We will assume the following conditions.(W.1)The vector 𝑊𝑛=(𝑊𝑛1,,𝑊𝑛𝑛) is exchangeable for all 𝑛=1,2,; that is, for any permutation 𝜋=(𝜋1,,𝜋𝑛) of (1,,𝑛), the joint distribution of 𝜋(𝑊𝑛)=(𝑊𝑛𝜋1,,𝑊𝑛𝜋𝑛) is the same as that of 𝑊𝑛.(W.2)𝑊𝑛𝑖0 for all 𝑛, 𝑖 and 𝑛𝑖=1𝑊𝑛𝑖=𝑛 for all 𝑛.(W.3)limsup𝑛𝑊𝑛12,1𝐶<, where𝑊𝑛12,1=0𝑊𝑊𝑛1𝑢d𝑢.(3.4)(W.4) One has lim𝜆limsup𝑛sup𝑡𝜆𝑡2𝑊𝑊𝑛1>𝑡=0.(3.5)(W.5)(1/𝑛)𝑛𝑖=1(𝑊𝑛𝑖1)2𝑊𝑐2>0.

In Efron’s nonparametric bootstrap, the bootstrap sample is drawn from the nonparametric estimate of the true distribution, that is, empirical distribution. Thus, it is easy to show that 𝑊𝑛Multinomial(𝑛;𝑛1,,𝑛1) and conditions (W.1)–(W.5) are satisfied. In general, conditions (W.3)–(W.5) are easily satisfied under some moment conditions on 𝑊𝑛𝑖; see [23, Lemma 3.1]. In addition to Efron’s nonparametric bootstrap, the sampling schemes that satisfy conditions (W.1)–(W.5) include Bayesian bootstrap, Multiplier bootstrap, Double bootstrap, and Urn bootstrap. This list is sufficiently long to indicate that conditions (W.1)–(W.5), are not unduly restrictive. Notice that the value of 𝑐 in (W.5) is independent of 𝑛 and depends on the resampling method, for example, 𝑐=1 for the nonparametric bootstrap and Bayesian bootstrap and 𝑐=2 for the double bootstrap. A more precise discussion of this general formulation of the bootstrap can be found in [23, 34, 35].

There exist two sources of randomness for the bootstrapped quantity, that is, 𝜶𝜙(𝜽): the first comes from the observed data and the second is due to the resampling done by the bootstrap, that is, random 𝑊𝑛𝑖’s. Therefore, in order to rigorously state our main theoretical results for the general bootstrap of 𝜙-divergence estimates, we need to specify relevant probability spaces and define stochastic orders with respect to relevant probability measures. Following [6, 36], we will view 𝐗𝑖 as the 𝑖th coordinate projection from the canonical probability space (𝒳,𝒜,𝜽0) onto the 𝑖th copy of 𝒳. For the joint randomness involved, the product probability space is defined as𝒳,𝒜,𝜽0×𝒲,Ω,𝑊=𝒳×𝒲,𝒜×Ω,𝜽0×𝑊.(3.6)

Throughout the paper, we assume that the bootstrap weights 𝑊𝑛𝑖’s are independent of the data 𝐗𝑖’s, thus𝑋𝑊=𝜽0×𝑊.(3.7) Given a real-valued function Δ𝑛 defined on the above product probability space, for example, 𝜶𝜙(𝜽), we say that Δ𝑛 is of an order 𝑂𝑜𝑊(1) in 𝜽0-probability if, for any 𝜖,𝜂>0, as 𝑛0,𝜽0𝑃𝑜𝑊𝑋||Δ𝑛||>𝜖>𝜂0(3.8)

and that Δ𝑛 is of an order 𝑂𝑜𝑊(1) in 𝜽0-probability if, for any 𝜂>0, there exists a 0<𝑀< such that, as 𝑛0,𝜽0𝑃𝑜𝑊𝑋||Δ𝑛||𝑀>𝜂0,(3.9)

where the superscript “𝑜” denotes the outer probability; see [34] for more details on outer probability measures. For more details on stochastic orders, the interested reader may refer to [6], in particular, Lemma 3 of the cited reference.

To establish the consistency of 𝜶𝜙(𝜽), the following conditions are assumed in our analysis.(A.1)One has 𝜽0𝜽,𝜽0>sup𝜽𝜶𝑁0𝜽0(𝜽,𝜶)(3.10) for any open set 𝑁(𝜽0)Θ containing 𝜽0.(A.2)One has sup𝜶𝚯||𝑛(𝜽,𝜶)𝜽0||(𝜽,𝜶)𝑜𝑋𝑊0.(3.11) The following theorem gives the consistency of the bootstrapped estimate𝜶𝜙(𝜽).

Theorem 3.1. Assume that conditions (A.1) and (A.2) hold. Suppose that conditions (A.3)–(A.5) and (W.1)–(W.5) hold. Then 𝜶𝜙(𝜽) is a consistent estimate of 𝜽0; that is, 𝜶𝜙(𝜃)𝑜𝑊𝜽0in𝜽0-probability.(3.12)

The proof of Theorem 3.1 is postponed until the appendix.

We need the following definitions; refer to [34, 37] among others. If is a class of functions for which, we have almost surely,𝑛=sup𝑓||𝑛||𝑓𝑓0,(3.13)

then we say that is a -Glivenko-Cantelli class of functions. If is a class of functions for which𝔾𝑛=𝑛𝑛𝔾in(),(3.14)

where 𝔾 is a mean-zero -Brownian bridge process with (uniformly) continuous sample paths with respect to the semimetric 𝜌(𝑓,𝑔), defined by𝜌2(𝑓,𝑔)=Var(𝑓(𝑋)𝑔(𝑋)),(3.15)

then we say that is a -Donsker class of functions. Here()=𝑣𝑣=sup𝑓||||𝑣(𝑓)<(3.16)

and 𝔾 is a -Brownian bridge process on if it is a mean-zero Gaussian process with covariance function𝔼(𝔾(𝑓)𝔾(𝑔))=𝑓𝑔(𝑓)(𝑔).(3.17)

Remark 3.2. (i) Condition (A.1) is the “well-separated” condition, compactness of the parameter space Θ and the continuity of divergence imply that the optimum is well separated, provided the parametric model is identified; see [37, Theorem 5.7].
(ii) Condition (A.2) holds if the class {(𝜽,𝜶)𝜶𝚯}(3.18) is shown to be -Glivenko-Cantelli, by applying [34, Lemma 3.6.16] and [6, Lemma A.1].

For any fixed 𝛿𝑛>0, define the class of functions 𝑛 and ̇𝑛 as𝑛𝜕=𝜕𝜶(𝜽,𝜶)𝜶𝜽0𝛿𝑛,̇𝑛𝜕=2𝜕𝜶2(𝜽,𝜶)𝜶𝜽0𝛿𝑛.(3.19) We will say a class of functions 𝑀(𝜽0) if possesses enough measurability for randomization with i.i.d multipliers to be possible, that is, 𝑛 can be randomized, in other words, we can replace (𝛿𝐗𝑖𝜽0) by (𝑊𝑛𝑖1)𝛿𝐗𝑖. It is known that 𝑀(𝜽0), for example, if is countable, if {𝑛}𝑛 are stochastically separable in , or if is image admissible Suslin; see [21, pages 853 and 854].

To state our result concerning the asymptotic normality, we will assume the following additional conditions.(A.3) The matrices 𝑉=𝜽0𝜕𝜕𝜶𝜽,𝜽0𝜕𝜕𝜶𝜽,𝜽0,𝑆=𝜽0𝜕2𝜕𝜶2𝜽,𝜽0(3.20) are nonsingular.(A.4) The class 𝑛𝑀(𝜽0)𝐿2(𝜽0) and is -Donsker.(A.5) The class ̇𝑛𝑀(𝜽0)𝐿2(𝜽0) and is -Donsker.

Conditions (A.4) and (A.5) ensure that the “size” of the function classes 𝑛 and ̇𝑛 are reasonable so that the bootstrapped empirical processes𝔾𝑛𝑛𝑛𝑛(3.21)

indexed, respectively, by 𝑛 and ̇𝑛, have a limiting process conditional on the original observations; we refer, for instance, to [23, Theorem 2.2]. The main result to be proved here may now be stated precisely as follows.

Theorem 3.3. Assume that 𝜶𝜙(𝜽) and 𝜶𝜙(𝜽) fulfil (2.12) and (3.3), respectively. In addition suppose that 𝜶𝜙(𝜽)𝜽0𝜽0,𝜶𝜙(𝜽)𝑜𝑊𝜽0𝑖𝑛𝜽0-probability.(3.22) Assume that conditions (A.3)–(A.5) and (W.1)–(W.5) hold. Then one has 𝜶𝜙(𝜽)𝜽0=𝑂𝑜𝑊𝑛1/2(3.23) in 𝜽0-probability. Furthermore, 𝑛𝜶𝜙𝜶(𝜽)𝜙(𝜽)=𝑆1𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝑜𝑜𝑊(1)(3.24) in 𝜽0-probability. Consequently, sup𝐱𝑑|||||𝑊𝒳𝑛𝑛𝑐𝜶𝜙𝜶(𝜽)𝜙|||||(𝜽)𝐱(𝑁(0,Σ)𝐱)=𝑜𝜽0(1),(3.25) where “≤” is taken componentwise and “𝑐” is given in (W.5), whose value depends on the used sampling scheme, and Σ𝑆1𝑉𝑆1,(3.26) where 𝑆 and 𝑉 are given in condition (A.3). Thus, one has sup𝐱𝑑|||||𝑊𝒳𝑛𝑛𝑐𝜶𝜙𝜶(𝜽)𝜙(𝜽)𝐱𝜽0𝑛𝜶𝜙(𝜽)𝜽0|||||𝐱𝜽00.(3.27)

The proof of Theorem 3.1 is captured in the forthcoming appendix.

Remark 3.4. Note that an appropriate choice of the bootstrap weights 𝑊𝑛𝑖’s implicates a smaller limit variance; that is, 𝑐2 is smaller than 1. For instance, typical examples are i.i.d-weighted bootstraps and the multivariate hypergeometric bootstrap; refer to [23, Examples 3.1 and 3.4].

Following [6], we will illustrate how to apply our results to construct the confidence sets. A lower 𝜖th quantile of bootstrap distribution is defined to be any 𝑞𝑛𝜖𝑑 fulfilling𝑞𝑛𝜖=inf𝐱𝑊𝒳𝑛𝜶𝜙,(𝜽)𝐱𝜖(3.28)

where 𝐱 is an infimum over the given set only if there does not exist a 𝐱1<𝐱 in 𝑑 such that𝑊𝒳𝑛𝜶𝜙(𝜽)𝐱1𝜖.(3.29)

Keep in mind the assumed regularity conditions on the criterion function, that is, (𝜽,𝜶) in the present framework, we can, without loss of generality, suppose that𝑊𝒳𝑛𝜶𝜙(𝜽)𝑞𝑛𝜖=𝜖.(3.30)

Making use of the distribution consistency result given in (3.27), we can approximate the 𝜖th quantile of the distribution of𝜶𝜙(𝜽)𝜽0𝑞by𝑛𝜖𝜶𝜙(𝜽)𝑐.(3.31)

Therefore, we define the percentile-type bootstrap confidence set as𝜶C(𝜖)=𝜙𝑞(𝜽)+𝑛(𝜖/2)𝜶𝜙(𝜽)𝑐,𝜶𝜙𝑞(𝜽)+𝑛(1𝜖/2)𝜶𝜙(𝜽)𝑐.(3.32) In a similar manner, the 𝜖th quantile of 𝜶𝑛(𝜙(𝜽)𝜽0) can be approximated by ̃𝑞𝑛𝜖, where ̃𝑞𝑛𝜖 is the 𝜖th quantile of the hybrid quantity (𝜶𝑛/𝑐)(𝜙𝜶(𝜽)𝜙(𝜽)), that is, 𝑊𝒳𝑛𝑛𝑐𝜶𝜙𝜶(𝜽)𝜙(𝜽)̃𝑞𝑛𝜖=𝜖.(3.33)

Note that̃𝑞𝑛𝜖=𝑛𝑐𝑞𝑛𝜖𝜶𝜙(𝜽).(3.34)

Thus, the hybrid-type bootstrap confidence set would be defined as follows:𝜶C(𝜖)=𝜙(𝜽)̃𝑞𝑛(1𝜖/2)𝑛,𝜶𝜙(𝜽)̃𝑞𝑛(𝜖/2)𝑛.(3.35) Note that 𝑞𝑛𝜖 and ̃𝑞𝑛𝜖 are not unique by the fact that we assume 𝜽 is a vector. Recall that, for any 𝐱𝑑,𝜽0𝑛𝜶𝜙(𝜽)𝜽0𝐱Ψ(𝐱),𝑊𝒳𝑛𝑛𝑐𝜶𝜙𝜶(𝜽)𝜙(𝜽)𝐱𝜽0Ψ(𝐱),(3.36)

whereΨ(𝐱)=(𝑁(0,Σ)𝐱).(3.37)

According to the quantile convergence theorem, that is, [37, Lemma 21.1], we have, almost surely,̃𝑞𝑛𝜖𝑋𝑊Ψ1(𝜖).(3.38)

When applying quantile convergence theorem, we use the almost sure representation, that is, [37, Theorem 2.19], and argue along subsequences. Considering Slutsky’s Theorem which ensures that𝑛𝜶𝜙(𝜽)𝜽0̃𝑞𝑛(𝜖/2)weaklyconvergesto𝑁(0,Σ)Ψ1(𝜖/2),(3.39)

we further have𝑋𝑊𝜽0𝜶𝜙(𝜽)̃𝑞𝑛(𝜖/2)𝑛=𝑋𝑊𝑛𝜶𝜙(𝜽)𝜽0̃𝑞𝑛(𝜖/2)𝑋𝑊𝑁(0,Σ)Ψ1𝜖2𝜖=12.(3.40)

The above arguments prove the consistency of the hybrid-type bootstrap confidence set, that is, (3.42), and can also be applied to the percentile-type bootstrap confidence set, that is, (3.41). For an in-depth study and more rigorous proof, we may refer to [37, Lemma 23.3]. The above discussion may be summarized as follows.

Corollary 3.5. Under the conditions in Theorem 3.3, one has, as 𝑛, 𝑋𝑊𝜶𝜙𝑞(𝜽)+𝑛(𝜖/2)𝜶𝜙(𝜽)𝑐𝜽0𝜶𝜙𝑞(𝜽)+𝑛(1𝜖/2)𝜶𝜙(𝜽)𝑐1𝜖,(3.41)𝑋𝑊𝜶𝜙(𝜽)̃𝑞𝑛(1𝜖/2)𝑛𝜽0𝜶𝜙(𝜽)̃𝑞𝑛(𝜖/2)𝑛1𝜖.(3.42)

It is well known that the above bootstrap confidence sets can be obtained easily through routine bootstrap sampling.

Remark 3.6. Notice that the choice of weights depends on the problem at hand: accuracy of the estimation of the entire distribution of the statistic, accuracy of a confidence interval, accuracy in large deviation sense, and accuracy for a finite sample size; we may refer to [38] and the references therein for more details. Barbe and Bertail [27] indicate that the area where the weighted bootstrap clearly performs better than the classical bootstrap is in term of coverage accuracy.

3.1. On the Choice of the Escort Parameter

The very peculiar choice of the escort parameter defined through 𝜽=𝜽0 has the same limit properties as the MLE one. The D𝜙DE 𝜶𝜙(𝜽0), in this case, has variance which indeed coincides with the MLE one; see, for instance, [28, Theorem 2.2, (1) (b)]. This result is of some relevance, since it leaves open the choice of the divergence, while keeping good asymptotic properties. For data generated from the distribution 𝒩(0,1), Figure 1 shows that the global maximum of the empirical criterion 𝑛𝜽(𝑛,𝜶) is zero, independently of the value of the escort parameter 𝜽𝑛 (the sample mean 𝑋=𝑛1𝑛𝑖=1𝐗𝑖, in Figure 1(a) and the median in Figure 1(b)) for all the considered divergences which is in agreement with the result of [39, Theorem 6], where it is showed that all differentiable divergences produce the same estimator of the parameter on any regular exponential family, in particular the normal models, which is the MLE one, provided that the conditions (2.6) and 𝐷𝜙(𝜽,𝜶)< are satisfied.

Unlike the case of data without contamination, the choice of the escort parameter is crucial in the estimation method in the presence of outliers. We plot in Figure 2 the empirical criterion 𝑛𝜽(𝑛,𝜶), where the data are generated from the distribution𝜽(1𝜖)𝒩0,1+𝜖𝛿10,(3.43)

where 𝜖=0.1, 𝜽0=0, and 𝛿𝑥 stands for the Dirac measure at 𝑥. Under contamination, when we take the empirical “mean,” 𝜽𝑛=𝑋, as the value of the escort parameter 𝜽, Figure 2(a) shows how the global maximum of the empirical criterion 𝑛𝜽(𝑛,𝜶) shifts from zero to the contamination point. In Figure 2(b), the choice of the “median” as escort parameter value leads to the position of the global maximum remaining close to 𝜶=0, for Hellinger (𝛾=0.5), 𝜒2(𝛾=2), and KL-divergence (𝛾=1), while the criterion associated to the KL𝑚-divergence (𝛾=0,themaximumistheMLE) is still affected by the presence of outliers.

In practice, the consequence is that if the data are subject to contamination the escort parameter should be chosen as a robust estimator of 𝜽0, say 𝜽𝑛. For more details about the performances of dual 𝜙-divergence estimators for normal density models, we refer to [40].

4. Examples

Keep in mind the definitions (2.8) and (2.9). In what follows, for easy reference and completeness, we give some usual examples of divergences, discussed in [41, 42], of divergences and the associated estimates; we may refer also to [43] for more examples and details.(i)Our first example is the Kullback-Leibler divergence: 𝜙𝜙(𝑥)=𝑥log𝑥𝑥+1,(𝑥)=log𝑥,𝑥𝜙(𝑥)𝜙(𝑥)=𝑥1.(4.1) The estimate of 𝐷K𝐿(𝜽,𝜽0) is given by 𝐷KL𝜽,𝜽0=sup𝜶𝚯logd𝜽d𝜶d𝜽d𝜽d𝜶1d𝑛,(4.2) and the estimate of the parameter 𝜽0, with escort parameter 𝜽, is defined as follows: 𝜶KL(𝜽)=argsup𝜶𝚯logd𝜽d𝜶d𝜽d𝜽d𝜶1d𝑛.(4.3)(ii)The second one is the 𝜒2-divergence: 1𝜙(𝑥)=2(𝑥1)2,𝜙(𝑥)=𝑥1,𝑥𝜙1(𝑥)𝜙(𝑥)=21𝑥2.(4.4) The estimate of 𝐷𝜒2(𝜽,𝜽0) is given by 𝐷𝜒2𝜽,𝜽0=sup𝜶𝚯d𝜽d𝜶1d𝜽12d𝜽d𝜶21d𝑛,(4.5) and the estimate of the parameter 𝜽0, with escort parameter 𝜽, is defined by 𝜶𝜒2(𝜽)=argsup𝜶𝚯d𝜽d𝜶1d𝜽12d𝜽d𝜶21d𝑛.(4.6)(iii)Another example is the Hellinger divergence: 𝜙(𝑥)=2𝑥12,𝜙1(𝑥)=2𝑥,𝑥𝜙(𝑥)𝜙(𝑥)=2𝑥2.(4.7) The estimate of 𝐷H(𝜽,𝜽0) is given by 𝐷H𝜽,𝜽0=sup𝜶𝚯22d𝜶d𝜽d𝜽2d𝜽d𝜶1d𝑛,(4.8) and the estimate of the parameter 𝜽0, with escort parameter 𝜽, is defined by 𝜶H(𝜽)=argsup𝜶𝚯22d𝜶d𝜽d𝜽2d𝜽d𝜶1d𝑛.(4.9)(iv)All the above examples are particular cases of the so-called “power divergences,” which are defined through the class of convex real-valued functions, for 𝛾 in {0,1},𝑥+𝜑𝛾𝑥(𝑥)=𝛾𝛾𝑥+𝛾1𝛾.(𝛾1)(4.10) The estimate of 𝐷𝛾(𝜽,𝜽0) is given by 𝐷𝛾𝜽,𝜽0=sup𝜶𝚯1𝛾1d𝜽d𝜶𝛾11d𝜽1𝛾d𝜽d𝜶𝛾1d𝑛,(4.11) and the parameter estimate is defined by𝜶𝛾(𝜽)=argsup𝜶𝚯1𝛾1d𝜽d𝜶𝛾11d𝜽1𝛾d𝜽d𝜶𝛾1d𝑛.(4.12)

Remark 4.1. The computation of the estimate 𝜶𝜙(𝜽) requires calculus of the integral in the formula (2.9). This integral can be explicitly calculated for the most standard parametric models. Below, we give a closed-form expression for Normal, log-Normal, Exponential, Gamma, Weibull, and Pareto density models. Hence, the computation of 𝜶𝜙(𝜽) can be performed by any standard nonlinear optimization code. Unfortunately, the explicit formula of 𝜶𝜙(𝜽), generally, cannot be derived, which also is the case for the ML method. In practical problems, to obtain the estimate 𝜶𝜙(𝜽), one can use the Newton-Raphson algorithm taking as initial point the escort parameter 𝜽. This algorithm is a powerful technique for solving equations numerically, performs well since the objective functions 𝜶Θ𝜽0(𝜽,𝜶) are concave and the estimated parameter is unique for functions 𝜶Θ𝑛(𝜽,𝜶); for instance, refer to [1, Remark 3.5].

4.1. Example of Normal Density

Consider the case of power divergences and the Normal model𝑁𝜽,𝝈2𝜽,𝝈2𝚯=×+.(4.13)

Set𝑝𝜽,𝝈1(𝑥)=𝝈12𝜋exp2𝑥𝜽𝝈2.(4.14)

Simple calculus gives, for 𝛾 in {0,1},1𝛾1d𝜽,𝝈1(𝑥)d𝜶,𝝈2(𝑥)𝛾1d𝜽,𝝈1=1(𝑥)𝑑𝑥𝝈𝛾11(𝛾1)𝝈𝛾2𝛾𝝈22(𝛾1)𝝈21exp𝛾(𝛾1)(𝜽𝜶)22𝛾𝝈22(𝛾1)𝝈21.(4.15)

This yields 𝐷𝛾𝜽,𝝈1,𝜽0,𝝈0=sup𝜶,𝝈21𝝈𝛾11(𝛾1)𝝈𝛾2𝛾𝝈22(𝛾1)𝝈21exp𝛾(𝛾1)(𝜽𝜶)22𝛾𝝈22(𝛾1)𝝈211𝛾𝑛𝑛𝑖=1𝝈2𝝈1𝛾𝛾exp2𝐗𝑖𝜽𝝈12𝐗𝑖𝜶𝝈221.𝛾(𝛾1)(4.16)

In the particular case, 𝜽𝒩(𝜽,1), it follows that, for 𝛾{0,1},𝐷𝛾𝜽,𝜽0=sup𝜶(𝜽,𝜶)d𝑛=sup𝜶1𝛾1exp𝛾(𝛾1)(𝜽𝜶)221𝛾𝑛𝑛𝑖=1𝛾exp2(𝜽𝜶)𝜽+𝜶2𝐗𝑖1.𝛾(𝛾1)(4.17)

For 𝛾=0,𝐷KL𝑚𝜽,𝜽0=sup𝜶(𝜽,𝜶)d𝑛=sup𝜶12𝑛𝑛𝑖=1(𝜽𝜶)𝜽+𝜶2𝐗𝑖,(4.18)

which leads to the maximum likelihood estimate independently upon 𝜽.

For 𝛾=1,𝐷KL𝜽,𝜽0=sup𝜶(𝜽,𝜶)d𝑛=sup𝜶12(𝜽𝜶)21𝑛𝑛𝑖=11exp2(𝜽𝜶)𝜽+𝜶2𝐗𝑖.+1(4.19)

4.2. Example of Log-Normal Density

Consider the case of power divergences and the log-Normal model𝑝𝜽,𝝈1(𝑥)=𝑥𝝈12𝜋exp2log(𝑥)𝜽𝝈2𝜽,𝝈2𝚯=×+,𝑥>0.(4.20)

Simple calculus gives, for 𝛾 in {0,1},1𝛾1d𝜽,𝝈1(𝑥)d𝜶,𝝈2(𝑥)𝛾1d𝜽,𝝈1=1(𝑥)𝑑𝑥𝝈𝛾11(𝛾1)𝝈𝛾2𝛾𝝈22(𝛾1)𝝈21exp𝛾(𝛾1)(𝜽𝜶)22𝛾𝝈22(𝛾1)𝝈21.(4.21)

This yields 𝐷𝛾𝜽,𝝈1,𝜽0,𝝈0=sup𝜶,𝝈21𝝈𝛾11(𝛾1)𝝈𝛾2𝛾𝝈22(𝛾1)𝝈21exp𝛾(𝛾1)(𝜽𝜶)22𝛾𝝈22(𝛾1)𝝈211𝛾𝑛𝑛𝑖=1𝝈2𝝈1𝛾𝛾exp2𝐗log𝑖𝜽𝝈12𝐗log𝑖𝜶𝝈221.𝛾(𝛾1)(4.22)

4.3. Example of Exponential Density

Consider the case of power divergences and the Exponential model𝑝𝜽(𝑥)=𝜽exp(𝜽𝑥)𝜽Θ=+.(4.23)

We have, for 𝛾 in {0,1},1𝛾1d𝜽(𝑥)d𝜶(𝑥)𝛾1d𝜽𝜽(𝑥)𝑑𝑥=𝜶(𝛾1)𝜽𝜽𝛾(𝛾1)𝜶(𝛾1)2.(4.24) Then, using this last equality, one finds𝐷𝛾𝜽,𝜽0=sup𝜶𝜽𝜶(𝛾1)𝜽𝜽𝛾(𝛾1)𝜶(𝛾1)21𝛾𝑛𝑛𝑖=1𝜽𝜶𝛾exp𝛾𝜽𝐗𝑖𝜶𝐗𝑖1𝛾.(𝛾1)(4.25)

In more general case, we may consider the Gamma density combined with the power divergence. The Gamma model is defined by𝑝𝜽(𝑥;𝑘)=𝜽𝑘𝑥𝑘1exp(𝑥𝜽)Γ(𝑘)𝑘,𝜽0,(4.26)

where Γ() is the Gamma functionΓ(𝑘)=0𝑥𝑘1exp(𝑥)𝑑𝑥.(4.27)

Simple calculus gives, for 𝛾 in {0,1},1𝛾1d𝜽;𝑘(𝑥)d𝜶;𝑘(𝑥)𝛾1d𝜽;𝑘𝜽(𝑥)𝑑𝑥=𝜶𝑘(𝛾1)𝜽𝜽𝛾𝜶(𝛾1)𝑘1,𝛾1(4.28)

which implies that𝐷𝛾𝜽,𝜽0=sup𝜶𝜽𝜶𝑘(𝛾1)𝜽𝜽𝛾𝜶(𝛾1)𝑘11𝛾1𝛾𝑛𝑛𝑖=1𝜽𝜶𝑘𝛾exp𝛾𝜽𝐗𝑖𝜶𝐗𝑖1.𝛾(𝛾1)(4.29)

4.4. Example of Weibull Density

Consider the case of power divergences and the Weibull density model, with the assumption that 𝑘+ is known and 𝜽 is the parameter of interest to be estimated, and recall that𝑝𝜽𝑘(𝑥)=𝜽𝑥𝜽𝑘1𝑥exp𝜽𝑘𝜽𝚯=+,𝑥0.(4.30)

Routine algebra gives, for 𝛾 in {0,1},1𝛾1d𝜽;𝑘(𝑥)d𝜶;𝑘(𝑥)𝛾1d𝜽;𝑘𝜶(𝑥)𝑑𝑥=𝜽𝑘(𝛾1)1𝛾(𝜽/𝜶)𝑘1(𝛾1),𝛾1(4.31)

which implies that𝐷𝛾𝜽,𝜽0=sup𝜶𝜶𝜽𝑘(𝛾1)1𝛾(𝜽/𝜶)𝑘1(𝛾1)1𝛾1𝛾𝑛𝑛𝑖=1𝜶𝜽𝑘𝛾𝐗exp𝛾𝑖𝜽𝑘𝐗𝑖𝜶𝑘1.𝛾(𝛾1)(4.32)

4.5. Example of the Pareto Density

Consider the case of power divergences and the Pareto density𝑝𝜽𝜽(𝑥)=𝑥𝜽+1𝑥>1;𝜽+.(4.33)

Simple calculus gives, for 𝛾 in {0,1},1𝛾1d𝜽(𝑥)d𝜶(𝑥)𝛾1d𝜽𝜽(𝑥)𝑑𝑥=𝜶(𝛾1)𝜽𝜽𝛾(𝛾1)𝜶(𝛾1)2.(4.34) As before, using this last equality, one finds𝐷𝛾𝜽,𝜽0=sup𝜶𝜽𝜶(𝛾1)𝜽𝜽𝛾(𝛾1)𝜶(𝛾1)21𝛾𝑛𝑛𝑖=1𝜽𝜶𝛾𝐗𝑖{𝛾(𝜽𝜶)}1.𝛾(𝛾1)(4.35)

For 𝛾=0,𝐷KL𝑚𝜽,𝜽0=sup𝜶(𝜽,𝜶)d𝑛=sup𝜶1𝑛𝑛𝑖=1𝜽log𝜶𝐗(𝜽𝜶)log𝑖,(4.36)

which leads to the maximum likelihood estimate, given by1𝑛𝑛𝑖=1𝐗log𝑖1,(4.37)

independently upon 𝜽.

Remark 4.2. The choice of divergence, that is, the statistical criterion, depends crucially on the problem at hand. For example, the 𝜒2-divergence among various divergences in the nonstandard problem (e.g., boundary problem estimation) is more appropriate. The idea is to include the parameter domain Θ into an enlarged space, say Θ𝑒, in order to render the boundary value an interior point of the new parameter space, Θ𝑒. Indeed, Kullback-Leibler, modified Kullback-Leibler, modified 𝜒2, and Hellinger divergences are infinite when d/d takes negative values on nonnegligible (with respect to ) subset of the support of , since the corresponding 𝜙() is infinite on (,0), when 𝜽 belongs to Θ𝑒Θ. This problem does not hold in the case of 𝜒2-divergence, in fact, the corresponding 𝜙() is finite on ; for more details refer to [41, 42, 44], and consult also [1, 45] for related matter. It is well known that when the underlying model is misspecified or when the data are contaminated the maximum likelihood or other classical parametric methods may be severely affected and lead to very poor results. Therefore, robust methods, which automatically circumvent the contamination effects and model misspecification, can be used to provide a compromise between efficient classical parametric methods and the semiparametric approach provided they are reasonably efficient at the model; this problem has been investigated in [46, 47]. In [41, 42], simulation results show that the choice of 𝜒2-divergence has good properties in terms of efficiency robustness. We mention that some progress has been made on automatic data-based selection of the tuning parameter 𝛼>0, appearing in formula (1) of [47]; the interested reader is referred to [48, 49]. It is mentioned in [50], where semiparametric minimum distance estimators are considered, that the MLE or inversion-type estimators involve solving a nonlinear equation which depends on some initial value. The second difficulty is that the objective function is not convex in 𝜽, in general, which give the situation of multiple roots. Thus, in general, “good” consistent initial estimates are necessary and the D𝜙DE should serve that purpose.

5. Random Right Censoring

Let 𝑇=𝑇1,,𝑇𝑛 be i.i.d survival times with continuous survival function 1𝐹𝜽0()=1𝜽0(𝑇), and let 𝐶1,,𝐶𝑛 be independent censoring times with d.f. 𝐺(). In the censoring setup, we observe only the pair 𝑌𝑖=min(𝑇𝑖,𝐶𝑖) and 𝛿𝑖=1{𝑇𝑖𝐶𝑖}, where 1{} is the indicator function of the event {}, which designs whether an observation has been censored or not. Let (𝑌1,𝛿1),,(𝑌𝑛,𝛿𝑛) denote the observed data points, and let𝑡(1)<𝑡(2)<<𝑡(𝑘)(5.1)

be the 𝑘 distinct death times. Now define the death set and risk set as follows: for 𝑗=1,,𝑘,𝐷(𝑗)=𝑖𝑦𝑖=𝑡(𝑗),𝛿𝑖,=1𝑅(𝑗)=𝑖𝑦𝑖.𝑡(𝑗)(5.2)

Kaplan and Meier’s [51] estimator of 1𝐹𝜃0(), denoted here by 𝐹1𝑛(), may be written as follows:𝐹1𝑛(𝑡)=𝑘𝑗=11𝑞𝐷(𝑗)1𝑞𝑅(𝑗)1{𝑇(𝑗)𝑡}.(5.3)

One may define a generally exchangeable weighted bootstrap scheme for the Kaplan-Meier estimator and related functionals as follows (cf. [38, page 1598]):𝐹1𝑛(𝑡)=𝑘𝑗=11q𝐷(𝑗)𝑊𝑛𝑞𝑞𝑅(𝑗)𝑊𝑛𝑞{𝑇(𝑗)𝑡}.(5.4)

Let 𝜓 be 𝐹𝜽0-integrable, and putΨ𝑛=𝜓(𝑢)d𝑛(𝑢)=𝑘𝑗=1Υ𝑗𝑛𝜓𝑇(𝑗),(5.5)

whereΥ𝑗𝑛=𝑞𝐷(𝑗)𝑊𝑛𝑞𝑞𝑅(𝑗)𝑊𝑛𝑞𝑗1𝑘=1𝑞𝐷(𝑘)𝑊𝑛𝑞𝑞𝑅(𝑘)𝑊𝑛𝑞.(5.6)

Note that we have used the following identity. Let 𝑎𝑖, 𝑖=1,,𝑘, 𝑏𝑖, 𝑖=1,,𝑘, be real numbers𝑘𝑖=1𝑎𝑖𝑘𝑖=1𝑏𝑖=𝑘𝑖=1𝑎𝑖𝑏𝑖𝑖1𝑗=1𝑏𝑗𝑘=1+𝑖𝑎.(5.7)

In the similar way, we define a more appropriate representation, which will be used in the sequel, as follows:Ψ𝑛=𝜓(𝑢)d𝑛(𝑢)=𝑛𝑗=1𝜋𝑗𝑛𝜓𝑌𝑗𝑛,(5.8)

where, for 1𝑗𝑛,𝜋𝑗𝑛=𝛿𝑗𝑛𝑞𝐷(𝑗)W𝑛𝑞𝑞𝑅(𝑗)𝑊𝑛𝑞𝑗1𝑘=1𝑞𝐷(𝑘)𝑊𝑛𝑞𝑞𝑅(𝑘)𝑊𝑛𝑞𝛿𝑘𝑛.(5.9)

Here, 𝑌1𝑛𝑌𝑛𝑛 are ordered 𝑌-values and 𝛿𝑖𝑛 denotes the concomitant associated with 𝑌𝑖𝑛. Hence, we may write𝑛=𝑛𝑗=1𝜋𝑗𝑛𝛿𝑌𝑖𝑛.(5.10) For the right censoring situation, the bootstrap D𝜙DEs, is defined by replacing 𝑛 in (2.10) by 𝑛, that is,𝜶𝑛(𝜽)=argsup𝜶𝚯(𝜽,𝜶)d𝑛,𝜽𝚯.(5.11) The corresponding estimating equation for the unknown parameter is then given by𝜕𝜕𝜶(𝜽,𝜶)d𝑛=0,(5.12) where we recall that𝜙(𝜽,𝜶,𝑥)=d𝜽d𝜶d𝜽d𝜽(𝑥)d𝜶𝜙(𝑥)d𝜽(𝑥)d𝜶(𝑥)𝜙d𝜽(𝑥)d𝜶(𝑥).(5.13)

Formula (5.11) defines a family of 𝑀-estimators for censored data. In the case of the power divergences family (2.4), it follows that from (4.11)(𝜽,𝜶)d𝑛=1𝛾1d𝜽d𝜶𝛾1d𝜽1𝛾d𝜽d𝜶𝛾d1𝑛1,𝛾1(5.14)

where𝑛=𝑛𝑗=1𝜔𝑗𝑛𝛿𝑌𝑖𝑛,(5.15)

and, for 1𝑗𝑛,𝜔𝑗𝑛=𝛿𝑗𝑛𝑛𝑗+1𝑗1𝑖=1𝑛𝑖𝑛𝑖+1𝛿𝑖𝑛.(5.16)

Consider the lifetime distribution to be the one-parameter exponential exp(𝜽) with density 𝜽𝑒𝜽𝑥, 𝑥0. Following [52], the Kaplan-Meier integral (𝜽,𝜶)d𝑛 may be written as𝑛𝑗=1𝜔𝑗𝑛𝜽,𝜶,𝑌𝑗𝑛.(5.17)

The MLE of 𝜽0 is given by𝜽𝑛,MLE=𝑛𝑗=1𝛿𝑗𝑛𝑗=1𝑌𝑗,(5.18) and the approximate MLE (AMLE) of [53] is defined by𝜽𝑛,AMLE=𝑛𝑗=1𝛿𝑗𝑛𝑗=1𝜔𝑗𝑛𝑌𝑗𝑛.(5.19) We infer from (4.24), that, for 𝛾{0,1},(𝜽,𝜶)d𝑛=𝜽𝛾𝜶1𝛾[]1(𝛾1)𝛾𝜽+(1𝛾)𝜶𝛾𝑛𝑗=1𝜔𝑗𝑛𝜽𝜶𝛾exp𝛾(𝜽𝜶)𝑌𝑗𝑛.1(5.20)

For 𝛾=0,(𝜽,𝜶)d𝑛=𝑛𝑗=1𝜔𝑗𝑛(𝜽𝜶)𝑌𝑗𝑛𝜽log𝜶.(5.21)

Observe that this divergence leads to the AMLE, independently upon the value of 𝜽.

For 𝛾=1,(𝜽,𝜶)d𝑛𝜽=log𝜶(𝜽𝜶)𝜽𝑛𝑖=1𝜔𝑗𝑛𝜽𝜶exp(𝜽𝜶)𝑌𝑗𝑛.1(5.22)

For more details about dual 𝜙-divergence estimators in right censoring, we refer to [54]; we leave this study open for future research. We mention that the bootstrapped estimators, in this framework, are obtained by replacing the weights 𝜔𝑗𝑛 by 𝜋𝑗𝑛 in the preceding formulas.

6. Simulations

In this section, series of experiments were conducted in order to examine the performance of the proposed random weighted bootstrap procedure of the D𝜙DEs, defined in (3.2). We provide numerical illustrations regarding the mean-squared error (MSE) and the coverage probabilities. The computing program codes were implemented in R.

The values of 𝛾 are chosen to be 1,0,0.5,1,2, which corresponds, as indicated above, to the well-known standard divergences: 𝜒2𝑚-divergence, KL𝑚, the Hellinger distance, KL, and the 𝜒2-divergence, respectively. The samples of sizes considered in our simulations are 25, 50, 75, 100, 150, and 200, and the estimates, D𝜙DEs 𝜶𝜙(𝜽), are obtained from 500 independent runs. The value of escort parameter 𝜃 is taken to be the MLE, which, under the model, is a consistent estimate of 𝜃0, and the limit distribution of the D𝜙DE 𝜶𝜙(𝜽0), in this case, has variance which indeed coincides with the MLE; for more details on this subject, we refer to [28, Theorem 2.2, (1) (b)], as it is mentioned in Section 3.1. The bootstrap weights are chosen to be𝑊𝑛1,,𝑊𝑛𝑛Dirichlet(𝑛;1,,1).(6.1)

In Figure 3, we plot the densities of the different estimates, it shows that the proposed estimators perform reasonably well.

Tables 1 and 2 provide the MSE of various estimates under the Normal model 𝑁(𝜃0=0,1). Here, we mention that the KL-based estimator (𝛾=1) is more efficient than the other competitors.

Tables 3 and 4 provide the MSE of various estimates under the Exponential model exp(𝜃0=1). As expected, the MLE produces most efficient estimators. A close look at the results of the simulations show that the D𝜙DEs perform well under the model. For large sample size 𝑛=200, the estimator based on the Hellinger distance is equivalent to that of the MLE. Indeed in terms of empirical MSE, the D𝜙DE with 𝛾=0.5 produces the same MSE as the MLE, while the performance of the other estimators is comparable.

Tables 5, 6, 7, and 8, provide the empirical coverage probabilities of the corresponding 0.95 weighted bootstrap confidence intervals based on 𝐵=500,1000 weighted bootstrap estimators. Notice that the empirical coverage probabilities as in any other inferential context, the greater the sample size, the better. From the results reported in these tables, we find that for large values of the sample size 𝑛, the empirical coverage probabilities are all close to the nominal level. One can see that the D𝜙DE with 𝛾=2 has the best empirical coverage probability which is near the assigned nominal level.

6.1. Right Censoring Case

This subsection presents some simulations for right censoring case discussed in Section 5. A sample is generated from exp(1) and an exponential censoring scheme is used; the censoring distribution is taken to be exp(1/9) that the proportion of censoring is 10%. To study the robustness properties of our estimators, 20% of the observations are contaminated by exp(5). The D𝜙DE’s 𝜶𝜙(𝜽) are calculated for samples of sizes 25, 50, 100, and 150, and the hole procedure is repeated 500 times. We can see from Table 9 that the D𝜙DEs perform well under the model in terms of MSE and are an attractive alternative to the AMLE.

Table 10 shows the variation in coverage of nominal 95% asymptotic confidence intervals according to the sample size. Clearly, under coverage of the confidence intervals, the D𝜙DEs have poor coverage probabilities due to the censoring effect. However, for small- and moderate-sized samples, the D𝜙DEs associated to 𝛾=2 outperform the AMLE.

Under contamination the performances of our estimators decrease considerably. Such findings are evidences for the need for more adequate procedures for right-censored data (Tables 11 and 12).

Remark 6.1. In order to extract methodological recommendations for the use of an appropriate divergence, it will be interesting to conduct extensive Monte Carlo experiments for several divergences or investigate theoretically the problem of the choice of the divergence which leads to an “optimal” (in some sense) estimate in terms of efficiency and robustness, which would go well beyond the scope of the present paper. Another challenging task is how to choose the bootstrap weights for a given divergence in order to obtain, for example, an efficient estimator.

Appendix

This section is devoted to the proofs of our results. The previously defined notation continues to be used in the following.

Proof of Theorem 3.1. Proceeding as [34] in their proof of the Argmax theorem, that is, Corollary 3.2.3, it is straightforward to show the consistency of the bootstrapped estimates 𝜶𝜙(𝜽).

Remark A.1. Note that the proof techniques of Theorem 3.3 are largely inspired by that of Cheng and Huang [6] and changes have been made in order to adapt them to our purpose.

Proof of Theorem 3.3. Keep in mind the following definitions: 𝔾𝑛=𝑛𝑛𝜽0,𝔾𝑛=𝑛𝑛𝑛.(A.1)In view of the fact that 𝜽0(𝜕/𝜕𝜶)(𝜽,𝜽0)=0, then a little calculation shows that 𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝑛𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0=𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙(𝜽)+𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙+(𝜽)𝑛𝑛𝜕𝜶𝜕𝜶𝜽,𝜙.(𝜽)(A.2) Consequently, we have the following inequality: 𝑛𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝔾𝑛𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0+𝔾𝑛𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0+𝑛𝑛𝜕𝜶𝜕𝜶𝜽,𝜙(𝜽)=𝐺1+𝐺2+𝐺3+𝐺4+𝐺5.(A.3) According to [23, Theorem  2.2] under condition (A.4), we have 𝐺1=𝑂𝑜𝑊(1) in 𝜽0-probability. In view of the CLT, we have 𝐺2=𝑂𝜽0(1). By applying a Taylor series expansion, we have 𝔾𝑛𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0=𝜶𝜙(𝜽)𝜽0𝔾𝑛𝜕2𝜕𝜶2𝜽,𝜶,(A.4) where 𝜶 is between 𝜶𝜙(𝜽) and 𝜽0. By condition (A.5) and [23, Theorem  2.2], we conclude that the right term in (A.4) is of order 𝑂𝑜𝑊𝜶(𝜙(𝜽)𝜽0) in 𝜽0-probability. By the fact that 𝜶𝜙(𝜽) is assumed to be consistent, we have 𝐺3=𝑜𝑜𝑊(1) in 𝜽0-probability. An analogous argument yields that 𝔾𝑛𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0(A.5)is of order 𝑂𝜽0𝜶(𝜙(𝜽)𝜽0), by the consistency of 𝜶𝜙(𝜽), we have 𝐺4=𝑜𝑜𝑊(1) in 𝜽0-probability. Finally, 𝐺5=0 based on (3.3). In summary, (A.3) can be rewritten as follows: 𝑛𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0𝑂𝑜𝑊(1)+𝑂𝑜𝜽0(1)(A.6) in 𝜽0-probability. On the other hand, by a Taylor series expansion, we can write 𝜽0𝜕𝜕𝜕𝜶(𝜽,𝜶)𝜕𝜶𝜽,𝜽0=𝜶𝜽0𝑆+𝑂𝜶𝜽02.(A.7) Clearly it is straightforward to combine (A.7) with (A.6), to infer the following: 𝑛𝑆𝜶𝜙(𝜽)𝜽0𝑂𝑜𝑊(1)+𝑂𝑜𝜽0(1)+𝑂𝑜𝑊𝑛𝜶𝜙(𝜽)𝜽02(A.8) in 𝜽0-probability, by considering again the consistency of 𝜶𝜙(𝜽) and condition (A.3) and making use of (A.8) to complete the proof of (3.23). We next prove (3.24). Introduce 𝐻1=𝔾𝑛𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0,𝐻2=𝔾𝑛𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0,𝐻3=𝔾𝑛𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0,𝐻4=𝑛𝑛𝜕𝜶𝜕𝜶𝜽,𝜙(𝜽)𝑛𝑛𝜕𝜶𝜕𝜶𝜽,𝜙.(𝜽)(A.9)By some algebra, we obtain 𝑛𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜶𝜕𝜶𝜽,𝜙(𝜽)+𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0=4𝑗=1𝐻𝑗.(A.10)Obviously, 𝐻1=𝑂𝑜𝑊(𝑛1/2) in 𝜽0-probability and 𝐻2=𝑂𝜽0(𝑛1/2). We also know that the order of 𝐻3 is 𝑂𝑜𝑊(𝑛1/2) in 𝜽0-probability. Using (2.12) and (3.3) we obtain that 𝐻4=0. Therefore, we have established 𝑛𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜶𝜕𝜶𝜽,𝜙(𝜽)=𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝑜𝜽0(1)+𝑜𝑜𝑊(1)(A.11) in 𝜽0-probability. To analyze the left-hand side of (A.11), we rewrite it as 𝑛𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0𝑛𝜽0𝜕𝜶𝜕𝜶𝜽,𝜙𝜕(𝜽)𝜕𝜶𝜽,𝜽0.(A.12)By a Taylor expansion, we obtain 𝜶𝑛𝑆𝜙𝜶(𝜽)𝜙(𝜽)=𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝑜𝜽0(1)+𝑜𝑜𝑊(1)+𝑂𝜽0𝑛1/2+𝑂𝑜𝑊𝑛1/2=𝔾𝑛𝜕𝜕𝜶𝜽,𝜽0+𝑜𝜽0(1)+𝑜𝑜𝑊(1)(A.13) in 𝜽0-probability. Keep in mind that, under condition (A.3), the matrix 𝑆 is nonsingular. Multiply both sides of (A.13) by 𝑆1 to obtain (3.24). An application of [23, Lemma 4.6], under the bootstrap weight conditions, thus implies (3.25). Using [1, Theorem 3.2] and [37, Lemma 2.11], it easily follows that sup𝐱𝑑|||𝜽0𝑛𝜶𝜙(𝜽)𝜽0|||𝐱(𝑁(0,Σ)𝐱)=𝑜𝜽0(1).(A.14) By combining (3.25) and (A.14), we readily obtain the desired conclusion (3.27).

Acknowledgments

The authors are grateful to the referees, whose insightful comments helped to improve an early draft of this paper greatly. The authors are indebted to Amor Keziou for careful reading and fruitful discussions on the subject. They would like to thank the associate editor for comments which helped in the completion of this work.