Increased Statistical Efficiency in a Lognormal Mean Model
Within the context of clinical and other scientific research, a substantial need exists for an accurate determination of the point estimate in a lognormal mean model, given that highly skewed data are often present. As such, logarithmic transformations are often advocated to achieve the assumptions of parametric statistical inference. Despite this, existing approaches that utilize only a sample’s mean and variance may not necessarily yield the most efficient estimator. The current investigation developed and tested an improved efficient point estimator for a lognormal mean by capturing more complete information via the sample’s coefficient of variation. Results of an empirical simulation study across varying sample sizes and population standard deviations indicated relative improvements in efficiency of up to 129.47 percent compared to the usual maximum likelihood estimator and up to 21.33 absolute percentage points above the efficient estimator presented by Shen and colleagues (2006). The relative efficiency of the proposed estimator increased particularly as a function of decreasing sample size and increasing population standard deviation.
The presence of highly skewed data is commonplace across both basic and applied sciences [1, 2]. In certain instances, the logarithmic transformation of such data may be undertaken with the primary purpose of establishing a normal distribution and improving variance, which may include removing heteroskedasticity in the process, to achieve . Patterson  provided seminal work concerning the statistical challenges involved in estimated the population mean following the transformation of data. More recently, Shen et al.  proposed an improved efficient minimum risk/relative mean squared error (RMSE) estimator of the lognormal mean, and numerous researchers have addressed the estimation of parameters within this distribution from both a frequentist and Bayesian context [5–9].
In presenting the lognormal distribution from a more fundamental perspective, if is a random variable with a lognormal distribution and a mean of , then will be normally distributed with mean of and variance of . Therefore, may also be expressed as with a mean of , observing that As such, when considering a random sample that is . and given with a mean of , then is . as for . The following may also be defined: noting that and are the maximum likelihood estimators (MLE) for and , respectively . By applying (1) to (2), the usual estimator (UER) for is
Addressed previously, Shen et al.  proposed a new estimator for by minimizing its relative mean square error (RMSE) through an application of the delta method , which yields the following when applied to (3): Notably, the class of estimators used by the authors was with By minimizing the of the estimators in the class by an order of , the optimal value of “” was obtained, wherein the optimal estimator, , in the class was identified as By applying the usual unbiased estimate of , a novel minimum risk/relative mean squared error estimator of was developed: or, alternatively,
Given the prevalence of logarithmic data across research disciplines and the importance in efficiently estimating these data, the purpose of the current study was to derive and assess an approach to obtain statistically efficient estimators of the lognormal mean. More specifically, the objectives involved incorporating more comprehensive information from the sample’s coefficient of variance, , following a logarithmic transformation of a resultant sample data in estimating the nontransformed population mean of the original distribution.
2. Preparatory Improvement
Being contingent upon the Rao-Blackwell theorem, any function of a sample mean or sample variance will be a uniformly minimum variance unbiased estimate (UMVUE) or Uniformly minimum mean squared error estimator (UMMSE) of the population if an estimator is an unbiased or minimum mean squared estimator (MMSE). Hence, a usual estimator (UER) as presented in (3) is the of (i.e., the population mean of the original lognormal distribution). Furthermore, it should be noted that distribution with degrees of freedom.
Numerous empirical analyses involve conditions wherein small values of the sample estimate of the coefficient of variation are observed. Therein, an alternate estimator appearing in Lovric and Sahai , denoted by , of the population mean, , is offered rather than the typical estimator, : Applying principals outlined in Nikulin , the relative efficiency (i.e., a key measure of an estimator’s optimality) of versus can be expressed as a percentage as The of the efficiency ratio, , as a function of (, ) may be determined as follows, given that (, ) is a complete sufficient statistic for (, ): with In explaining the aforementioned in more detail, particularly the development of term to include the terms and , suppose is a function of the sample mean and the sample variance for a random sample from a normal population. Therefore, and would be known as having independent sampling distributions. Consequently, the expression may be regarded as a “two-phase” exercise; in the first phase, may be viewed as the expectation with respect to the random variable , treating the silenced (i.e., pseudo) relative variable as a constant, . Subsequently, in the second phase, the random variable also has an expectation of being viewed as . Cumulatively, therefore, . Applied specifically to (12), the integration by parts may be detailed accordingly to ultimately yield : Notably, the other term vanishes by the well-known properties of definite integral, , as the integrand is the odd function of . Subsequently, the remainder of the derivation follows by way of integration by parts as with as
Again, it is important to note that, independent of , approximates a distribution with degrees of freedom. Therefore, again applied to (12), Furthermore By applying (19) within (18), the following may be obtained: Again through the application of integration by parts, or, expressed differently, noting that
As such, the UMVUE relating to the statistical relative efficiency of versus , given that (hence, ) from (12), and including (16) and (19), would be derived from and, thus, if , or per (17), as for all , with the coefficient of variation of per (16), or if .
Given the aforementioned, the alternate estimator defined in (10), , would be a more efficient estimator of the normal population mean, . It is important to also note that this proposed estimator could be expressed as a function of the square of sample coefficient of variation:
3. The Improved Lognormal Mean Estimator
As previously noted, the purpose of this research investigation was to improve and test the estimator proposed by Shen (2006), presented initially in (8),
Offered in (25), the development of the proposed estimator, , draws upon (10) and, through substitution from ((8), (26)), the following is obtained: Describing the approach through which from (10) was developed to the expression in (25) and further to (27), To illustrate, if is even , the term is negligible.
To derive an efficient estimator of the normal variance using the sample coefficient of variance, the sixth iteration efficient estimator of the normal variance presented in Lovric and Sahai  may be applied as where Consequently, the proposed lognormal mean estimator from the current investigation, GS(2014), is
4. Empirical Simulation Study
4.1. Study Methodology
To compare the proposed estimator, GS(2014), to the existing efficient estimator, SHEN(2006), and the usual maximum likelihood estimator, UER, a simulation study was undertaken using various sample sizes (i.e., , and ) and population mean values (i.e., , and ), with a fixed variance of (i.e., ). Some 11,000 iterations across sample sizes were conducted using Matlab 2010b [The Mathworks Inc., Natick, MA], drawing randomly from a population of . Comparisons of GS(2014) versus UER and SHEN(2006) versus UER were drawn in percentage terms via relative efficiencies, RelEff%, as as, per (4) and (9), Therein, actual RMSEs of all estimators in (33), (34), and (35) were calculated as an average across each of the simulation’s 11,000 iterations.
Presented in Table 1, the relative efficiencies of both the proposed estimator, GS(2014), and the existing efficient estimator, SHEN(2006), recorded improvements to the usual maximum likelihood estimator, UER. In general, relative efficiencies for both efficient estimators increased as a function of lower sample size, though the proposed estimator, GS(2014), was also more efficient at lower population standard deviations. Across 74 of the 77 analytic categories (96%), GS(2014) noted higher relative efficiencies than SHEN(2006), with the absolute difference being most pronounced at lower sample sizes plus lower population standard deviations. To illustrate, the greatest absolute percentage difference favoring the efficiency of GS(2014) compared to SHEN(2006) was +21.33 percent at and . Comparatively, the maximum difference across the three cases where SHEN(2006) outperformed the proposed estimator was +0.9 percent at and . The relative efficiencies for GS(2014) ranged from a low of 100.64 percent to a high of 129.47 percent compared to a low of 101.05 percent to a high of 108.71 percent for the SHEN(2006) estimator. Collectively, findings support the proposed estimator, GS(2014), under almost all combinations of sample sizes and population standard deviations compared to the current efficient estimator and the usual maximum likelihood estimator. In instances where the existing efficient estimator did perform better, the differences were small and perhaps clinically negligible (i.e., absolute differences below 1.0%).
Within the context of clinical and other scientific research, a substantial need exists for an accurate determination of the point estimate for a lognormal mean. The transformation of highly skewed data is often undertaken to achieve assumptions required for parametric statistical inference. Despite this, existing approaches that capture only a sample’s mean and variance do not necessarily yield the most efficient estimator. The current investigation developed and tested more efficient point estimators for a lognormal mean model by capturing more complete information within the sample’s coefficient of variation. Results of an empirical simulation study across varying sample sizes and population standard deviations indicated relative improvements in efficiency of up to 129.47 percent compared to the usual maximum likelihood estimator and up to 21.33 percentage points above the current efficient estimator. The relative efficiency of the proposed estimator increased particularly as a function of decreasing sample size and increasing population standard deviation.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
G. H. Skrepnek, “Regression methods in the empirical analysis of health care data,” Journal of Managed Care Pharmacy, vol. 11, no. 3, pp. 240–251, 2005.View at: Google Scholar
R. L. Patterson, “Difficulties involved in the estimation of a population mean using transformed sample data,” Technometrics, vol. 8, pp. 535–537, 1966.View at: Google Scholar
X. H. Zhou, “Estimation of the log-normal mean,” Statistics in Medicine, vol. 17, pp. 2251–2264, 1998.View at: Google Scholar
G. W. Oehlert, “A note on the delta method,” American Statistician, vol. 46, pp. 27–29, 1992.View at: Google Scholar
M. M. Lovric and A. Sahai, “An iterative algorithm for efficient estimation of normal variance using sample coefficient of variation,” InterStat, article 001, 2011.View at: Google Scholar
M. S. Nikulin, “Efficiency of a statistical procedure,” in Encyclopedia of Mathematics, M. Hazewinkel, Ed., Springer, Berlin, Germany, 1992.View at: Google Scholar