#### Abstract

The classical (or in high dimensions) inference procedure for unknown mean is so fundamental in statistics and so prevailing in practices; it is regarded as an optimal procedure in the mind of many practitioners. It this manuscript we present a new procedure based on data depth trimming and bootstrapping that can outperform the classical (or in high dimensions) confidence interval (or region) procedure.

#### 1. Introduction

Let be a random sample from distribution with an unknown mean parameter . The most prevailing procedure for estimating is the classical -confidence interval. A confidence interval (CI) for and large is where is the standard sample mean, is the standard sample deviation, and is the th upper quantile of a distribution with degrees of freedom . The rule of thumb in most textbooks for the sample size is: do not use procedure, do not use it if outliers present, use it if . The procedure is based on the large sample property and the central limit theorem. So it is not exact but an approximation for large sample size and arbitrary population distribution.

In higher dimensions, the counterpart to procedure (1.1) is the celebrated Hotelling's procedure: A confidence region for the unknown vector and large is the region: where is the sample covariance matrix and is the upper th quantile of distribution with degrees of freedom .

Procedure (1.1) and (1.2) are so prevailing in practices that in many practitioners, mind they are regarded as optimal and unbeatable procedure. Are they really unbeatable? In this manuscript we introduce a new procedure that can outperform these seemingly optimal procedures.

The rest of the paper is organized as follows. Section 2 introduces the new procedure and Section 3 conducts some simulation studies. The paper ends in Section 4 with some concluding remarks.

#### 2. A New Procedure for the Unknown

##### 2.1. A Univariate Location Estimator

It is well known that the sample mean in the procedure is extreme sensitive to outliers, heavy tailed distributions, or contamination. The procedure therefore is not *robust*. So naturally, one would replace the sample mean with a robust counterpart. We will utilize a special univariate location estimator to replace the sample mean in the procedure.

Now we consider a special univariate “projection depth-trimmed mean” () for in , (see Wu and Zuo [1] in , also see Zuo [2] for a multidimensional

where are distinct numbers from such that for some , and ; where Med is standard sample median and is standard median of absolute deviations (see Zuo and Serfling [3] and Zuo [4] for the study of in high-dimensions).

Let be the empirical distribution based on which places mass at points , . We sometimes write (for ) for convenience. Let be the distribution of . Replacing with in the above definition, we obtain the population version. For example, the popular version of for is

##### 2.2. The New Procedure

Let be a random sample from the empirical distribution . It is often called a *bootstrap sample*. Let be bootstrap samples from .

We calculate for . Now we calculate depth of with respect to : and then order with respect to their depth from the smallest to the largest: where .

Finally, we simply delete first points from . Then the interval (or closed convex hull in high-dimensions) formed by is our confidence interval for , where is the floor function.

#### 3. Simulation Study

Now we conduct simulation study to examine the performance of the new and classical (or ) procedure based on (replication) samples from various distribution (including , and others). Set and ; we consider the combinations of with the bootstrap number , , and .

We will confine attention to the average length (or area) of the confidence interval (or region) from both procedures as well as their coverage frequency of true parameter (which is assumed to be the mean of the ), which ideally should be close to . If both procedures can reach the nominal level , then it is obviously better to have a shorter (or smaller) confidence interval (or region) or smaller average length (or area) of the intervals (or regions).

##### 3.1. One Dimension

Table 1 lists the simulation results at the normal and distributions.

Inspecting the table immediately reveals that the bootstrap number affects the average coverage of the new procedure, with the increase of it gets closer to the nominal level , while the average length of intervals gets slightly larger. Of course, it does not affect the procedure which has nothing to do with bootstrap. Overall, both procedures are indeed (roughly) procedure and the new one produces an interval on the average about shorter than that of the classical procedure even at case, and it becomes shorter in the case.

Figure 1 displays the typical single run results from two procedures based on 100 sample points from (a) and (b). We see even at case, the new procedure outperforms the classical procedure with a confidence interval shorter than that of , both cover the target parameter . At case, new procedure produces an interval shorter than that of . Both cover the target parameter .

**(a)**

**(b)**

In our simulation studies, we also compare our new procedure with the existing *bootstrap percentile confidence procedure* (i.e., it orders means of bootstrap samples and then just to trim the upper and lower points, the left points form an interval which is called bootstrap percentile conference interval, where is the ceiling function of ), our new procedure also outperforms this one. But the later performs better than the classical procedure in term of the average length of intervals at the same confidence level.

Our experiments with also reveal that small (the real situation in practice) is in favor of our new procedure. Note that this is the exact case where it is difficult to determine if the data are *close to normal* and hence to decide if one is able to use the classical CI. This is what we expected since the classical CI is based on normal (or on the large sample property for large ). But this does not mean that the classical CI has an edge over the new procedure at really large sample size (say, 10,000) even for the perfect case.

In addition to the distributions we considered in Table 1, we also conduct simulation studies to compare the performance of the new and classical procedure at contaminated normal model: with different choices of and since we know in practice, there is never a pure (exact) ; we may have just a slight departure from the pure normal or some contamination. Our results reveal that the new procedure is overwhelmingly more robust than the classical ; this is what we would expect since the procedure depends on the sample mean which is notorious for its extreme sensitivity to outliers or contaminations. We also compare the performance of the two procedures at Cauchy distribution since we know that sample mean performs extremely well at symmetric light tailed distributions like but not so at heavy tailer ones like cauchy distribution.

We first display the typical single run results of confidence intervals in Figure 2 to demonstrate the difference between the two procedures.

**(a)**

**(b)**

Here in Figure 2, on the left-hand side are CI's by (red one) and by our new procedure (blue one) at the model with an interval from : and from new procedure which is longer than that of . These intervals are supposed to estimating the mean parameter in this case is . So both intervals cover the unknown parameter .

On the right-hand side are CIs by (red one) and by new procedure (blue one) at the Cauchy distribution with an interval from : and from new procedure: which is shorter.

Of course, the single run results may not represent the overall performance of the two procedures. So we conduct a simulation over replications. The results are listed in Table 2.

Inspecting the table immediately reveals that the classical procedure becomes useless in the heavy tailed Cauchy distribution case: exceeding the nominal level and reaching with an extremely wide confidence interval, no informative any more. At the same time, the new procedure can roughly reach the nominal level (it is about ) and provide a meaningful estimation about the underlying unknown parameter. We list the results from the contaminated model with just contamination to a pure model with the contamination also come from a normal distribution centered at and with a small variance . Under such a potential real situation, the classical procedure becomes again useless since it can never reach the nominal level , it is a roughly procedure with an interval slightly longer than that of the new procedure, while the new procedure still is a reasonable procedure with an interval on the average shorter than that of one.

##### 3.2. Higher Dimensions

In higher dimensions, with the multivariate version of and (see Zuo [4], Zuo [2]) it is straightforward to extend our new procedure described in Section 2. That is, with the bootstrap sample: we calculate for . Then we calculate the projection depth of with respect to : and then we order 's with respect to their depth from smallest to largest: where . The final step is the same as before: trimming first points from 's the left formed a convex hull, that is our confidence region for . We will examine the performance of this one and the classical Hotelling's given in (1.2) in term of their average area of confidence regions as well as their coverage frequency of true parameter (which is assumed to be the mean of the ). The latter ideally should be close to . If both procedures can reach the nominal level , then it is obviously better to have a smaller confidence region or smaller average area of confidence regions.

We first display single run results of two procedures at bivariate standard normal distribution and bivariate distribution with 3 degrees of freedom in Figure 3.

**(a)**

**(b)**

Of course, single run result may not represent the overall performance of the two procedures. To see if the single run results are repeatable now we list the average of coverage and the area of the confidence regions based on two procedures in replications in Table 3. Here we set , and .

Inspecting the Table reveals that the two procedures are indeed (roughly) confidence procedures. Therefore it make sense to compare their average area of confidence regions. The table entries show that the new procedure produce a confidence region on the average smaller than that of the classical Hotelling's procedure in term of area even at . This becomes in case.

#### 4. Concluding Remarks

From the last section we see that the new procedure has some advantages over the classical (seemingly optimal) procedures. But we know that we cannot get all the advantages of the new procedure for free. What kind of price we have to pay here? For all the advantage of the new procedures possess over the classical ones, the price it has to pay is the intensive computing in the implement of the procedure. In our simulation study, there are 4 million basic operations (the case , replication and bootstrap number ). Computing the data depth in two or higher dimensions is very challenging. Fortunately, there is a R package (called ExPD2D) for the exact computation of projection depth of bivariate data already developed by Zuo and Ye [5] and is part of CRAN now. For high-dimensional computation, see Zuo [6]. In one dimension it is straightforward. One can compute the sample median in linear time (i.e., the worst case time complexity is ) by employing special technic (see any computer science Algorithm textbook), for further discussion about the property of related remedian, see H. Chen and Z. Chen [7]. Fortunately, in practices, only one replication is needed. Also with the everlasting advance in computing power, the computation burden should not be an excuse for not using a better procedure.

A natural question is Why the new procedure has advantage over the classical one? The procedure clearly depends on bootstrap and data depth. Is it due to bootstrap or data depth? Who is the main contributor? If one just uses bootstrap, can one have some advantages? The answer for the latter is positive, Indeed, in our simulation we compare the classical one with the bootstrap percentile procedure, it reveals that the bootstrap percentile one does have some mild advantage over the classical one but still is inferior to our new procedure. So both bootstrap and data depth make contributions to the advantages of the new procedure. But remember, it is data depth that allow the bootstrap percentile procedure (which originally was defined only in one dimension) implementable in high-dimensions: to order sample bootstrap mean vectors. Without the data depth, it is impossible to implement the procedure in high-dimensions. So overall, it is data depth that makes the major contribution towards the advantages of the new procedure.

We also like to point out at this point that there is different new procedure introduced and studied in Zuo [8], where depth-weighted mean used in the procedure instead of the depth-trimmed mean used in our current procedure. However, our simulation studies indicate that our current new procedure is superior to the one in Zuo [8] which confines attention mainly to one dimension.

Our empirical evidence for the new procedure in one and higher dimensions is very promising, but we still need some theoretical developments and justifications, which is beyond scope of this paper and will be pursued elsewhere. A heuristic argument is because the bootstrap percentile confidence interval has advantage over the classic confidence interval procedure in term of at the same nominal level it can produce an asymptotically shorter interval (see Hall [9], and Falk and Kaufmann [10]). But the classical bootstrap percentile interval procedure is limited to one dimension, here we use data depth to ordering high-dimensional estimators so that we can extend the procedure to high-dimensions. The advantage of bootstrap percentile confidence interval carries on to high-dimensions.

One question left about our new procedure in practices is how does one choose the value? Well, there are at least two ways to deal with this value problem. First, one can chose a fixed value, our empirical experience indicates a value between 0.01-0.1 will serve most of our purposes. Or (second), dynamically choose value by minimizing some objective function which could be your interval length in our simulation case or variance in the efficiency evaluation case. With such a data dependant , one natural question raised is: Is the theory in Zuo [2] established based on the fixed constant still holds? Fortunately, all still hold if we employ a more powerful tool (empirical process theory) from Pollard [11] or van der Vaart and Wellner [12] to handle this situation with a data dependent .

There are a number of depth functions and related depth estimators (see Tukey [13], Liu [14], Zuo and Serfling [3], and Bai and He [15]), but among them projection depth function used here is the most favorite one (see Zuo [4, 16]). Furthermore, the computation of depth functions all are very challenging but we have some algorithm at hand for the projection depth function, this is yet another motivation for us to pick the projection depth function in this paper.

Finally, we comment that findings in this paper are consistent with the results obtained in Bai and Saranadasa (BS) [17] which shows the *Effect* of high-dimension, that is, there are better procedures than the classical inference procedures like Hotrlling's one which is inferior compared to other procedures like Dempster's nonexact test (Dempster [18]) and BS proposed test even for moderately large dimension and sample sizes.

#### Acknowledgment

This research was partially supported by NSF Grants DMS-0234078 and DMS-0501174.