Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Population Mean
This paper presents three confidence intervals for the coefficient of variation in a normal distribution with a known population mean. One of the proposed confidence intervals is based on the normal approximation. The other proposed confidence intervals are the shortest-length confidence interval and the equal-tailed confidence interval. A Monte Carlo simulation study was conducted to compare the performance of the proposed confidence intervals with the existing confidence intervals. Simulation results have shown that all three proposed confidence intervals perform well in terms of coverage probability and expected length.
The coefficient of variation of a distribution is a dimensionless number that quantifies the degree of variability relative to the mean . It is a statistical measure for comparing the dispersion of several variables obtained by different units. The population coefficient of variation is defined as a ratio of the population standard deviation to the population mean given by . The typical sample estimate of is given as where is the sample standard deviation, the square root of the unbiased estimator of population variance, and is the sample mean.
The coefficient of variation has been widely used in many areas such as science, medicine, engineering, economics, and others. For example, the coefficient of variation has also been employed by Ahn  to analyze the uncertainty of fault trees. Gong and Li  assessed the strength of ceramics by using the coefficient of variation. Faber and Korn  applied the coefficient of variation as a way of including a measure of variation in the mean synaptic response of the central nervous system. The coefficient of variation has also been used to assess the homogeneity of bone test samples to help determine the effect of external treatments on the properties of bones . Billings et al.  used the coefficient of variation to study the impact of socioeconomic status on hospital use in New York City. In finance and actuarial science, the coefficient of variation can be used as a measure of relative risk and a test of the equality of the coefficients of variation for two stocks . Furthermore, Pyne et al.  studied the variability of the competitive performance of Olympic swimmers by using the coefficient of variation.
Although the point estimator of the population coefficient of variation shown in (1) can be a useful statistical measure, its confidence interval is more useful than the point estimator. A confidence interval provides much more information about the population characteristic of interest than does a point estimate (e.g., Smithson , Thompson , and Steiger ). There are several approaches available for constructing the confidence interval for . McKay  proposed a confidence interval for based on the chi-square distribution; this confidence interval works well when [13–17]. Later, Vangel  proposed a new confidence interval for , which is called a modified McKay’s confidence interval. His confidence interval is based on an analysis of the distribution of a class of approximate pivotal quantities for the normal coefficient of variation. In addition, modified McKay’s confidence interval is closely related to McKay’s confidence interval but it is usually more accurate and nearly exact under normality. Panichkitkosolkul  modified McKay’s confidence interval by replacing the sample coefficient of variation with the maximum likelihood estimator for a normal distribution. Sharma and Krishna  introduced the asymptotic distribution and confidence interval of the reciprocal of the coefficient of variation which does not require any assumptions about the population distribution to be made. Miller  discussed the approximate distribution of and proposed the approximate confidence interval for in the case of a normal distribution. The performance of many confidence intervals for obtained by McKay’s, Miller’s, and Sharma-Krishna’s methods was compared under the same simulation conditions by Ng .
Mahmoudvand and Hassani  proposed an approximately unbiased estimator for in a normal distribution and also used this estimator for constructing two approximate confidence intervals for the coefficient of variation. The confidence intervals for in normal and lognormal were proposed by Koopmans et al.  and Verrill . Buntao and Niwitpong  also introduced an interval estimating the difference of the coefficient of variation for lognormal and delta-lognormal distributions. Curto and Pinto  constructed the confidence interval for when random variables are not independently and identically distributed. Recent work of Gulhar et al.  has compared several confidence intervals for estimating the population coefficient of variation based on parametric, nonparametric, and modified methods.
However, the population mean may be known in several phenomena. The confidence intervals of the aforementioned authors have not been used for estimating the population coefficient of variation for the normal distribution with a known population mean. Therefore, our main aim in this paper is to propose three confidence intervals for in a normal distribution with a known population mean.
The organization of this paper is as follows. In Section 2, the theoretical background of the proposed confidence intervals is discussed. The investigations of the performance of the proposed confidence interval through a Monte Carlo simulation study are presented in Section 3. A comparison of the confidence intervals is also illustrated by using an empirical application in Section 4. Conclusions are provided in the final section.
2. Theoretical Results
In this section, the mean and variance of the estimator of the coefficient of variation in a normal distribution with a known population mean are considered. In addition, we will introduce an unbiased estimator for the coefficient of variation, obtain its variance, and finally construct three confidence intervals: normal approximation, shortest-length, and equal-tailed confidence intervals.
If the population mean is known to be , then the population coefficient of variation is given by . The sample estimate of is where . To find the expectation of (2), we have to prove the following lemma.
Lemma 1. Let be a random sample from normal distribution with known mean and variance and let . Then where .
Proof of Lemma 1. By definition,
Let and . From Theorem B of Rice [29, page 197], the distribution of is central chi-square distribution with degrees of freedom. Similarly, the distribution of is central chi-square distribution with degrees of freedom; that is,
One can see that [30, page 181] where .
Similarly, where .
Equations (5) and (6) are equivalent. Thus, we obtain . Next, we will find the variance of :
By using Lemma 1, we can show that the mean and variance of are Note that as . Therefore, it follows that It means that is asymptotically unbiased and asymptotically consistent for . From (10), the unbiased estimator of is Using Lemma 1, the mean and variance of are given by Thus, Hence, is also asymptotically consistent for . Next, we examine the accuracy of from another point view. Let us first consider the following theorem.
Theorem 2. Let be a random sample from a probability density function , which has unknown parameter . If is an unbiased estimator of , it can be shown under very general conditions that the variance of must satisfy the inequality where is the Fisher information. This is known as the Cramér-Rao inequality. If , the estimator is said to be efficient.
By setting in Theorem 2, it is easy to show that where is any unbiased estimator of . This means that the variance for the efficient estimator of is .
From (15), we will show that . The asymptotic expansion of the gamma function ratio is  Now, if in (19), we have Thus, we obtain Therefore, . This means that is asymptotically efficient (see (18)). In the following section, three confidence intervals for are proposed.
2.1. Normal Approximation Confidence Interval
Using the normal approximate, we have Therefore, the confidence interval for based on (22) is where is the percentile of the standard normal distribution.
2.2. Shortest-Length Confidence Interval
A pivotal quantity for is Converting the statement we can write Thus, the confidence interval for based on the pivotal quantity is where , , and the length of confidence interval for is defined as In order to find the shortest-length confidence interval for , the following problem has to be solved: where is the probability density function of central chi-square distribution with degrees of freedom. From Casella and Berger [33, pages 443-444], the shortest-length confidence interval for based on the pivotal quantity is determined by the value of and satisfying
2.3. Equal-Tailed Confidence Interval
The equal-tailed confidence interval for based on the pivotal quantity is where and are the and percentiles of the central chi-square distribution with degrees of freedom, respectively.
3. Simulation Study
A Monte Carlo simulation was conducted using the R statistical software [34–36] version 3.0.1 to investigate the estimated coverage probabilities and expected lengths of three proposed confidence intervals and to compare them to the existing confidence intervals. The estimated coverage probability and the expected length (based on replicates) are given by where denotes the number of simulation runs for which the population coefficient of variation lies within the confidence interval. The data were generated from a normal distribution with a known population mean and = 0.05, 0.10, 0.20, 0.33, 0.50, and 0.67 and sample sizes of 5, 10, 15, 25, 50, and 100. The number of simulation runs is equal to 50,000 and the nominal confidence levels are fixed at 0.90 and 0.95. Three existing confidence intervals are considered, namely, Miller’s , McKay’s , and Vangel’s .
Vangel: The upper McKay’s limit will have to be set to under the following condition : and the upper Vangel’s limit will have to be set to under the following condition: As can be seen from Tables 2 and 3, the three proposed confidence intervals have estimated coverage probabilities close to the nominal confidence level in all cases. On the other hand, the Miller’s, McKay’s, and Vangel’s confidence intervals provide estimated coverage probabilities much different from the nominal confidence level, especially when the population coefficient of variation is large. In other words, the estimated coverage probabilities of existing confidence intervals tend to be too high. Additionally, the estimated coverage probabilities of existing confidence intervals increase as the values of get larger (i.e., for 95% McKay’s confidence interval, , 0.9522 for = 0.05; 0.9539 for = 0.10; 0.9856 for = 0.67). However, Figure 1 shows that the estimated coverage probabilities of the three proposed confidence intervals do not increase or decrease according to the values of .
As can be seen from Figure 2, McKay’s and Vangel’s confidence intervals have longer expected lengths than Miller’s and the proposed confidence intervals. While the expected lengths of the three proposed confidence intervals are shorter than the lengths of the existing ones in almost all cases. Additionally, when the sample sizes increase, the lengths become shorter (i.e., for 95% shortest-length confidence interval, = 0.20, 0.1553 for ; 0.0949 for = 25; 0.0665 for = 50).
4. An Empirical Application
To illustrate the application of the confidence intervals proposed in the previous section, we used the weights (in grams) of 61 one-month old infants listed as follows:The data are taken from the study by Ziegler et al.  (cited in Ledolter and Hogg , page 287). The histogram, density plot, Box-and-Whisker plot, and normal quantile-quantile plot are displayed in Figure 3. Algorithm 1 shows the result of the Shapiro-Wilk normality test.
As they appear in Figure 3 and Algorithm 1, we find that the data are in excellent agreement with a normal distribution. From past research, we assume that the population mean of the weight of one-month old infants is about 4400 grams. An unbiased estimator of the coefficient of variation is . The 95% of proposed and existing confidence intervals for the coefficient of variation are calculated and reported in Table 4. This result confirms that the three confidence intervals proposed in this paper are more efficient than the existing confidence intervals in terms of length of interval.
The coefficient of variation is the ratio of standard deviation to the mean and provides a widely used unit-free measure of dispersion. It can be useful for comparing the variability between groups of observations. Three confidence intervals for the coefficient of variation in a normal distribution with a known population mean have been developed. The proposed confidence intervals are compared with Miller’s, McKay’s, and Vangel’s confidence intervals through a Monte Carlo simulation study. Normal approximation, shortest-length, and equal-tailed confidence intervals are better than the existing confidence intervals in terms of the expected length and the closeness of the estimated coverage probability to the nominal confidence level.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
The author is grateful to Professor Dr. Tonghui Wang, Professor Dr. John J. Borkowski, and anonymous referees for their valuable comments and suggestions, which have significantly enhanced the quality and presentation of this paper.
J. Gong and Y. Li, “Relationship between the Estimated Weibull Modulus and the Coefficient of Variation of the Measured Strength for Ceramics,” Journal of the American Ceramic Society, vol. 82, no. 2, pp. 449–452, 1999.View at: Google Scholar
E. G. Miller and M. J. Karson, “Testing the equality of two coefficients of variation,” in American Statistical Association: Proceedings of the Business and Economics Section, Part I, pp. 278–283, 1977.View at: Google Scholar
B. Thompson, “What future quantitative social science research could look like: confidence intervals for effect sizes,” Educational Researcher, vol. 31, no. 3, pp. 25–32, 2002.View at: Google Scholar
E. C. Fieller, “A numerical test of the adequacy of A.T. McKay's approximation,” Journal of the Royal Statistical Society, vol. 95, no. 4, pp. 699–702, 1932.View at: Google Scholar
B. Iglewicz, Some properties of the coefficient of variation [Ph.D. thesis], Virginia Polytechnic Institute, Blacksburg, Va, USA, 1967.
B. Iglewicz and R. H. Myers, “Comparisons of approximations to the percentage points of the sample coefficient of variation,” Technometrics, vol. 12, no. 1, pp. 166–169, 1970.View at: Google Scholar
E. S. Pearson, “Comparison of A.T. McKay's approximation with experimental sampling results,” Journal of the Royal Statistics Society, vol. 95, no. 4, pp. 703–704, 1932.View at: Google Scholar
G. J. Umphrey, “A comment on McKay's approximation for the coefficient of variation,” Communications in Statistics-Simulation and Computation, vol. 12, no. 5, pp. 629–635, 1983.View at: Google Scholar
M. G. Vangel, “Confidence intervals for a normal coefficient of variation,” American Statistician, vol. 50, no. 1, pp. 21–26, 1996.View at: Google Scholar
W. Panichkitkosolkul, “Improved confidence intervals for a coefficient of variation of a normal distribution,” Thailand Statistician, vol. 7, no. 2, pp. 193–199, 2009.View at: Google Scholar
E. G. Miller, “Asymptotic test statistics for coefficient of variation,” Communications in Statistics-Theory and Methods, vol. 20, no. 10, pp. 3351–3363, 1991.View at: Google Scholar
K. C. Ng, “Performance of three methods of interval estimation of the coefficient of variation,” InterStat, 2006, http://interstat.statjournals.net/YEAR/2006/articles/0609002.pdf.View at: Google Scholar
L. H. Koopmans, D. B. Owen, and J. I. Rosenblatt, “Confidence intervals for the coefficient of variation for the normal and lognormal distributions,” Biometrika, vol. 51, no. 1-2, pp. 25–32, 1964.View at: Google Scholar
S. Verrill, “Confidence bounds for normal and log-normal distribution coefficient of variation,” Research Paper EPL-RP-609, U. S. Department of Agriculture, Madison, Wis, USA, 2003.View at: Google Scholar
N. Buntao and S. Niwitpong, “Confidence intervals for the difference of coefficients of variation for lognormal distributions and delta-lognormal distributions,” Applied Mathematical Sciences, vol. 6, no. 134, pp. 6691–6704, 2012.View at: Google Scholar
M. Gulhar, B. M. G. Kibria, A. N. Albatineh, and N. U. Ahmed, “A comparison of some confidence intervals for estimating the population coefficient of variation: a simulation study,” SORT, vol. 36, no. 1, pp. 45–68, 2012.View at: Google Scholar
J. A. Rice, Mathematical Statistics and Data Analysis, Duxbury Press, Belmont, Calif, USA, 2006.
S. F. Arnold, Mathematical Statistics, Prentice-Hall, New Jersey, NJ, USA, 1990.
E. J. Dudewicz and S. N. Mishra, Modern Mathematical Statistics, John Wiley & Sons, Singapore, 1988.
R. L. Graham, D. E. Knuth, and O. Patashink, Answer to Problem 9.60 in Concrete Mathematics: A Foundation for Computer Science, Addison-Wesley, Reading, Pa, USA, 1994.
G. Casella and R. L. Berger, Statistical Inference, Duxbury Press, California, Calif, USA, 2001.
R. Ihaka and R. Gentleman, “R: a language for data analysis and graphics,” Journal of Computational and Graphical Statistics, vol. 5, no. 3, pp. 299–314, 1996.View at: Google Scholar
R Development Core Team, An Introduction to R, R Foundation for Statistical Computing, Vienna, Austria, 2013.
R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2013.
E. Ziegler, S. E. Nelson, and J. M. Jeter, Early Iron Supplementation of Breastfed Infants, Department of Pediatrics, University of Iowa, Iowa City, Iowa, USA, 2007.
J. Ledolter and R. V. Hogg, Applied Statistics for Engineers and Physical Scientists, Pearson, New Jersey, NJ, USA, 2010.