Abstract

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.

1. Introduction

A confidence interval is an interval in which a measurement or trial falls corresponding to a given probability [1, 2]. In statistics, confidence intervals centered about an estimate of the mean target the mean [1]. Because these confidence intervals rely on the standard error of the estimate, they increase as sample size decreases and they decrease as sample size increases. Consequently, the confidence interval, in statistics, is often a margin of error for the estimate of the mean [3].

We are interested in finding the probability of a confidence interval centered about an estimate of the mean that targets an arbitrary measurement and is independent of the sample size. In this way, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of sample size, but has a width of standard deviations. We approach this problem by considering the known case of a confidence interval centered about the mean and we derive the associated probability. We correlate this result to the case of infinite sample size. The next step is to calculate the probability associated with the minimum sample size of one. This requires that we average the probability over different possible sample values. Finally, we propose an equation for probability that naturally interpolates our results and we show that the equation is consistent with intermediate probability values by comparing the equation to estimates of probability produced by computer simulation.

2. Expected Probability

In order to make the necessary confidence interval calculations, we will have to determine an expected or average probability. This can be examined by considering the cumulative distribution function [4, 5], where is a measurement, is the mean, and is the standard deviation.

Let . So, . Therefore, (2.1) becomes where is the so-called erf function [6, 7].

Now, we know that Rather than specifying a value of , we would like to compute the probability averaged over all possible values of . So we have This yields Let . So, . Therefore, (2.6) becomes Since is an odd function, we have This can be easily confirmed by an alternative calculation. The probability density is Therefore, we can write [8] The expected value [9] of the cumulative distribution function is thus Note that when , and when , . Therefore, we can write Consequently, as per (2.4), we can write the expected probability as

3. The Probability of a Confidence Interval

We want to determine the probability given as where is the estimate of the mean, . At the maximal value of , we find that For a normal distribution, the probability that a measurement falls within standard deviations of the mean (i.e., within the interval ) is given by Now, let , so . Then Observe that we can express (3.7) in terms of the cumulative probability distribution function . Thus, we can write The minimal value of is one, so that .

In this case, we have We find that (3.9) reduces into The probability in (3.10) is written in terms of the value . Consequently, we can derive an expected probability by computing Let which implies that . This reduces (3.12) into Equation (3.14) can be evaluated numerically to yield the following: A portion of the numerical results is presented in Table 1.

Based on (3.7) and (3.15), we can propose the equation in which we are by default referring to the expected probability.

Equation (3.16) clearly converges for the extreme estimates of the mean, since A plot of (3.16) for four values of is shown in Figure 1. Observe that the curve for is intermediate to the other curves. Consequently, we can have confidence in (3.16) if we can show that it is valid for .

4. Computational Test

We can estimate computationally. Simulate the normal, independent random variables and , for which Let the condition be If is the number of trials in which the condition is met and is the total number of trials, then an estimate of is given as Figure 2 shows a plot of versus for and .

5. Conclusions

We derive a general equation for the probability that a measurement falls within a range of standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. It is consistent with our equation that probability reduces with sample size. However, for samples greater than ten, the value of probability begins to converge. The equation for probability is derived by considering the minimal and maximal sample sizes and producing an equation which naturally interpolates the results. Computer simulation is used to estimate probability for the sample size that produces intermediate results that are in strong agreement with the general equation.

Acknowledgment

Discussions with Maxwell Lueckenhoff are appreciated.