The Probability That a Measurement Falls within a Range of Standard Deviations from an Estimate of the Mean
We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.
A confidence interval is an interval in which a measurement or trial falls corresponding to a given probability [1, 2]. In statistics, confidence intervals centered about an estimate of the mean target the mean . Because these confidence intervals rely on the standard error of the estimate, they increase as sample size decreases and they decrease as sample size increases. Consequently, the confidence interval, in statistics, is often a margin of error for the estimate of the mean .
We are interested in finding the probability of a confidence interval centered about an estimate of the mean that targets an arbitrary measurement and is independent of the sample size. In this way, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of sample size, but has a width of standard deviations. We approach this problem by considering the known case of a confidence interval centered about the mean and we derive the associated probability. We correlate this result to the case of infinite sample size. The next step is to calculate the probability associated with the minimum sample size of one. This requires that we average the probability over different possible sample values. Finally, we propose an equation for probability that naturally interpolates our results and we show that the equation is consistent with intermediate probability values by comparing the equation to estimates of probability produced by computer simulation.
2. Expected Probability
In order to make the necessary confidence interval calculations, we will have to determine an expected or average probability. This can be examined by considering the cumulative distribution function [4, 5], where is a measurement, is the mean, and is the standard deviation.
Let . So, . Therefore, (2.1) becomes where is the so-called erf function [6, 7].
Now, we know that Rather than specifying a value of , we would like to compute the probability averaged over all possible values of . So we have This yields Let . So, . Therefore, (2.6) becomes Since is an odd function, we have This can be easily confirmed by an alternative calculation. The probability density is Therefore, we can write  The expected value  of the cumulative distribution function is thus Note that when , and when , . Therefore, we can write Consequently, as per (2.4), we can write the expected probability as
3. The Probability of a Confidence Interval
We want to determine the probability given as where is the estimate of the mean, . At the maximal value of , we find that For a normal distribution, the probability that a measurement falls within standard deviations of the mean (i.e., within the interval ) is given by Now, let , so . Then Observe that we can express (3.7) in terms of the cumulative probability distribution function . Thus, we can write The minimal value of is one, so that .
In this case, we have We find that (3.9) reduces into The probability in (3.10) is written in terms of the value . Consequently, we can derive an expected probability by computing Let which implies that . This reduces (3.12) into Equation (3.14) can be evaluated numerically to yield the following: A portion of the numerical results is presented in Table 1.
Based on (3.7) and (3.15), we can propose the equation in which we are by default referring to the expected probability.
Equation (3.16) clearly converges for the extreme estimates of the mean, since A plot of (3.16) for four values of is shown in Figure 1. Observe that the curve for is intermediate to the other curves. Consequently, we can have confidence in (3.16) if we can show that it is valid for .
4. Computational Test
We can estimate computationally. Simulate the normal, independent random variables and , for which Let the condition be If is the number of trials in which the condition is met and is the total number of trials, then an estimate of is given as Figure 2 shows a plot of versus for and .
We derive a general equation for the probability that a measurement falls within a range of standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. It is consistent with our equation that probability reduces with sample size. However, for samples greater than ten, the value of probability begins to converge. The equation for probability is derived by considering the minimal and maximal sample sizes and producing an equation which naturally interpolates the results. Computer simulation is used to estimate probability for the sample size that produces intermediate results that are in strong agreement with the general equation.
Discussions with Maxwell Lueckenhoff are appreciated.
J. F. Kenney and E. S. Keeping, ““Confidence limits for the binomial parameter” and “Confidence interval charts” §11.4 and 11.5,” in Mathematics of Statistics, part 1, pp. 167–169, D. Van Nostrand, Princeton, NJ, USA, 3rd edition, 1962.View at: Google Scholar
A. Tennant and E. M. Badley, “A confidence interval approach to investigating non-response bias and monitoring response to postal questionnaires,” Journal of Epidemiology and Community Health, vol. 45, no. 1, pp. 81–85, 1991.View at: Google Scholar
D. G. Rees, Essential Statistics, Chapman & Hall/CRC, 4th edition, 2001.
D. Zwillinger and S. Kokoska, CRC standard probability and statistics tables and formulae, Chapman & Hall/CRC, Boca Raton, FL, 2000.View at: Zentralblatt MATH
U. Balasooriva, J. Li, and C. K. Low, “On interpreting and extracting information from the cumulative distribution function curve: a new perspective with applications,” Australian Senior Mathematics Journal, vol. 26, no. 1, 2012.View at: Google Scholar
J. Spanier and K. B. Oldham, “The error function and its complement,” in An Atlas of Functions, chapter 40, pp. 385–393, Hemisphere, Washington, DC, USA, 1987.View at: Google Scholar
L. M. Houston, G. A. Glass, and A. D. Dymnikov, “Sign-bit amplitude recovery in Gaussian noise,” Journal of Seismic Exploration, vol. 19, no. 3, pp. 249–262, 2010.View at: Google Scholar
N. G. Ushakov, “Density of a probability distribution,” in Encyclopedia of Mathematics, M. Hazewinkel, Ed., Springer, 2001.View at: Google Scholar
L. M. Houston, G. A. Glass, and A. D. Dymnikov, “Sign data derivative recovery,” ISRN Applied Mathematics, vol. 2012, Article ID 630702, 7 pages, 2012.View at: Publisher Site | Google Scholar