Abstract

An omnibus test for normality with an adjustment for symmetric alternatives is developed using the empirical likelihood ratio technique. We first transform the raw data via a jackknife transformation technique by deleting one observation at a time. The probability integral transformation was then applied on the transformed data, and under the null hypothesis, the transformed data have a limiting uniform distribution, reducing testing for normality to testing for uniformity. Employing the empirical likelihood technique, we show that the test statistic has a chi-square limiting distribution. We also demonstrated that, under the established symmetric settings, the CUSUM-type and Shiryaev–Roberts test statistics gave comparable properties and power. The proposed test has good control of type I error. Monte Carlo simulations revealed that the proposed test outperformed studied classical existing tests under symmetric short-tailed alternatives. Findings from a real data study further revealed the robustness and applicability of the proposed test in practice.

1. Introduction

The empirical likelihood (EL) methodology was introduced in [1, 2] and has been widely studied as a nonparametric approximation of the parametric likelihood approach (e.g., [36]). Thus, it utilizes the concept of the likelihoods in a distribution-free manner in approximating optimal parametric likelihood-based techniques. The method provides a versatile approach that may be applied to perform inference for a wide variety of statistical applications. An area with substantial new development in the use of the EL methods is hypothesis testing. Various researchers have proposed goodness-of-fit (GoF) tests for continuous distributions based on the EL for a wide range of hypothesis tests, which includes exponentiality [7, 8], logistic [9], uniformity [10], and normality [11, 12].

From the various proposed EL testing procedures as well as in the current statistical practice, it is evident that the problem of testing composite hypotheses of normality is undeniably the most common research focus in GoF testing. The continued growing need for normality tests is attributed to the frequent use and applications of normally distributed data in various areas of pure and applied statistical practices. Although it is difficult to propose a test for normality competing with the highly efficient family of Shapiro–Wilk tests (e.g., [1316]), the proposed EL-based normality tests have proved to be superior under certain alternative distributions [12]. Of these tests, the moment-based tests seem to have gained more traction due to their flexibility, simplicity, power properties, and convenient use of omnibus tests in accessing the normality of underlying continuous distributions.

To test for normality, Dong and Giles [11] proposed an omnibus test statistic by directly utilizing the EL methodology outlined by Owen [17]. They utilized the first four moment constraints that characterize the normal distribution. After outlining drawbacks of the test proposed by Dong and Giles [11], Shan et al. [12] proposed a cumulative sum- (CUSUM-) type simple and exact empirical likelihood ratio-based (SEELR) test statistic for normality which unlike that of Dong and Giles [11] has good control of type I error (also see [18]) and can be easily implemented in a wide range of statistical packages. The test by Shan et al. [12] is an omnibus test that makes use of standardized sample observations using the Lin and Mudholkar [19] jackknife transformation. In their study, Shan et al. [12] reported that power of their proposed omnibus test is comparable to well-known existing tests and oftentimes outperforms these tests under certain alternatives, mostly asymmetric distributions.

Just like some tests for normality, the test proposed by Shan et al. [12] suffers the loss of power under several symmetric alternatives. It is a challenge to propose an omnibus test that has high power than the classical Jarque–Bera’s tests [20, 21] and the D’Agostino–Pearson test [22] in detecting departures from normality in alternatives that exhibit the symmetric nature of the normal distribution. Through the utilization of various mathematical and statistical properties that characterize the normal distribution (for example, see [19, 23, 24]) one can remedy such shortfalls in GoF tests. One such remedy is transformation to uniformity, which has several benefits that include increasing the power of a test under certain alternatives (for example, see [8, 25]). For a data-driven omnibus test for symmetry, Fang et al. [25] utilized a bootstrapping approach coupled with the probability integral transformation, and under the null hypothesis, the transformed data had a limiting uniform distribution. For superior power under symmetric alternatives, their proposed test required only odd-ordered orthogonal moments of the transformed data in constructing the test statistic.

The use of the probability integral transformation in the development of GoF tests of normality has been widely used especially in empirical distribution function- (EDF-) based tests. Rosenblatt [26] first introduced the concept. Thus, the EDF tests make use of the probability integral transformation . If is the distribution function of , the random variable is uniformly distributed between 0 and 1. Given observations , the values are computed. The most commonly used EDF tests for normality are the Anderson–Darling test [27, 28] and the Lilliefors test which is well known as the modified Kolmogorov–Smirnov test [29] and the Cramér–von Mises test [30]. In addition to the use of the probability integral transformation, several approaches have been used to construct GoF tests for the composite hypothesis of normality. In this study, we adopted the EL methodology to propose a new omnibus test for normality by exploiting different forms of characterizing the normal distribution. The purpose of this paper is to use a jackknife characterization due to Shan et al. [12], as in Lin and Mudholkar [19], followed by a probability integral transformation (see [8, 25]) for developing a goodness-of-fit test for normality. Here we consider the approach to obtaining a GoF test statistic by combining two well-known characterizations, individually powerful against different classes of alternatives. However, following the works of Fang et al. [25], we restrict attention to symmetric alternatives. Power comparisons are conducted with some of the most widely known EDF-based tests, well-known and powerful moment-based tests, and the powerful classical SW tests.

2. Test Development

Consider an unknown continuous distribution with nonordered random variables denoted by that are assumed to be independent and identically distributed . The intention is to test whether the observations are consistent with a normal distribution. Thus, we intended to test whether to accept or reject the following null hypothesis:where and are unknown parameters. We then proposed to use standardized random variables of the sample observations. To achieve this, we adopted a jackknife transformation technique by deleting one observation at a time following Lin and Mudholkar [19] works (also see [12]). Thus, we transformed our observations usingwhere , , and . It should be noted according to Shan et al. [12] as gets large, the standardized data points become asymptotically independent while under the null hypothesis they are distributed according to a distribution with df, which as grows approaches the standard normal. In addition to this transformation, we then further adopted the probability integral transformation (see [25] as well as [8]). The probability integral transformation then transformed the standardized random variables into independent uniformly distributed random variables . That is, under the null hypothesis, the transformed data follow the uniform distribution asymptotically. From the proposed transformation, are uniformly distributed on , where the density function of is given bywhere  = the lowest value of and  = the highest value of . The moment of the uniform distribution is defined by

From the uniformly transformed observations, we then proposed to test for the following null hypothesis:versus the alternative that are from a nonuniform distribution defined on (0, 1). In accordance with the EL method, under , we considered unbiased empirical moment equations utilizing the raw moments of the uniform distribution, which are given by

The composite hypothesis for the ELR test was then given by

The nonparametric empirical likelihood function corresponding to the given hypotheses in equation (7) is expressed aswhere the unknown probability parameters are attained under and . Under , the EL function given in equation (8) is maximized with respect to the subject to two constraints:

Following this, the weights of are identified aswhere , for . If we then use the Lagrangian multipliers technique, it can be shown that the maximum EL function under can be expressed aswhere in equation (11) is a root of

Under the alternative hypothesis, is not required to identify the weights, in order to maximize the EL function but only, . Thus, under , the nonparametric EL function is given by

Now let us consider to be −2 log-likelihood test statistic for the hypotheses . It should be noted that, under , minus two times the log LLR has an asymptotic limiting distribution [1]. Considering the null and alternative hypotheses, the test statistic is given by

We then proposed to reject the null hypothesis using two different test statistics. Firstly, we considered the cumulative sum- (CUSUM-) type statistic given by

Secondly, we considered the common alternative to the CUSUM-type statistic, which is to utilize the Shiryaev–Roberts (SR) statistic (for example, see [31] among others). In our case, the classical SR statistic was of the formwhere is the test threshold and is percentile of the distribution. The set are integer values representing the moment constraints that will maximize the test statistic. Our proposed test statistic (equation (15)) is developed utilizing approaches introduced by Vexler and Wu [32], and various authors have demonstrated that the SR statistic (see equation (16)) and the CUSUM-type statistic have almost equivalent optimal statistical properties due to their common null-martingale basis [31]. The choice of is also vital in moment-based test statistics. Fang et al. [25] utilized the probability integral transformation and recommended that, under the test of symmetry, only odd-ordered moments of the transformed data are required in the construction of the test statistic. They further alluded that the use of odd-ordered moments has several benefits that include power against most symmetric alternatives and robustness and performs well under small sample sizes among others.

For our proposed test statistics, we decided to then conduct an extensive Monte Carlo simulation exercise in order to empirically evaluate the suitable choice of that will give us optimal power under symmetric alternatives in testing for normality. Following the work of Fang et al. [25] as well as Shan et al. [12], we estimated the powers of the test statistics for different alternatives and definitions of odd-ordered moments of . Table 1 displays a subset of the Monte Caro simulation results. We also considered additional alternatives based on samples of sizes n = 20, 50, and 100 at . We used size-adjusted critical values for each test statistic, and power for each test was computed using 5,000 replications. The results were that both the CUSUM-type and the Shiryaev–Roberts proposed test statistics with showed an average power that was greater than that of all other cases under symmetric short-tailed alternatives. In addition, the CUSUM-type test statistics with showed an average power that was greater than that of all other cases under symmetric long-tailed alternatives. Since our proposed test is meant to perform superior under symmetric alternatives, we recommend to use since the proposed test statistics with are simple and provide relatively high levels of power under symmetric alternatives. The results of our Monte Carlo simulation experiments are consistent with that of Fang et al. [25]. In this article, we denoted CSELR for the CUSUM-type test statistic and SRELR for the Shiryaev–Roberts proposed test statistic. A schematic algorithm of the testing procedure is shown in Figure 1.

3. Monte Carlo Simulation Procedures

We utilized the R statistical package for all the simulation procedures. Firstly, size-adjusted critical values for the proposed test statistics were determined. In order to achieve this, we used 50,000 replications, and without loss of generality, data were simulated from a standard normal distribution at stipulated sample sizes and -levels. Only samples of sizes 20 to 100 were considered (see Table 2). This was entirely motivated by the need to utilize samples that commonly arise in practice.

For power comparisons, twelve selected competitor tests were considered. The choice of these tests was guided by potential competitor tests, thus tests developed using similar characterization techniques as well as well-known powerful classical normality tests. Three broad categories of these competitor tests were established. These include EDF-based tests, moment-based tests, and the Shapiro–Wilk-based tests. For the EDF-based tests, we opted for the Anderson–Darling (AD) test [27, 28], the Lilliefors (LL) test which is well known as the modified Kolmogorov–Smirnov test [29], the Cramér–von Mises (CVM) test [30], and the test [33]. Moment-based tests included the Jarque–Bera’s (JB) test [20, 21], the robust Jarque–Bera’s (JB) test [20, 21], the kurtosis test [14], the skewness test [14], the D’Agostino–Pearson (DP) test [22], and the simple and exact empirical likelihood ratio (SEELR) test based on moment relations [12]. Lastly, the other categories of competitor tests consisted of the classical well-known and powerful Shapiro–Wilk (SW) test [13] and the Shapiro–Francia’s (SF) test [15] which is a modification of the SW test. Most of these competitor tests have proved to be powerful against a wide range of alternatives including symmetric ones [3339].

In terms of the alternative distributions, we considered distributions that cover a wide range of symmetric alternative distributional properties. Following Esteban et al. [40] and Torabi et al. [33], we considered the following alternative distributions, which can be classified into two broad sets of symmetric alternative distributions: (1) symmetric short-tailed distributions and (2) symmetric long-tailed distributions. In order to evaluate our proposed tests under asymmetric alternatives, we opted for a third set (3) of asymmetric alternatives:(1)Set 1: symmetric short-tailed distributions:(i)The beta distribution with parameters (3, 3), (2, 2), (1, 1), and (0.5, 0.5)(ii)The uniform distribution, with a = 0 and b = 1(iii)The logit-normal distribution with and (iv)The truncated standard normal distribution at a and b, i.e., (−2, 2) and (−1, 1)(v)Tukey’s lambda distribution with , and 1.25(2)Set 2: symmetric long-tailed distributions:(i)Student’s distribution with 2, 4, 7, and 15 degrees of freedom.(ii)The Cauchy distribution with and (iii)The logistic distribution with parameters (location) and (scale)(iv)The double exponential distribution (also known as the Laplace distribution) with parameters (location) and (scale)(v)Tukey’s lambda distribution with , and −0.25(3)Set 3: asymmetric distributions:(i)The gamma distribution with parameters (2, 1)(ii)The Weibull distribution with parameters (2, 1)(iii)The skewed normal distribution with parameters (0, 1, 5)(iv)The skewed Cauchy distribution with parameters (0, 2, 5)(v)The beta distribution with parameters (2, 1) and (3, 1.5)

For power simulation, 10,000 samples each of size , and 100 were obtained under the various alternative distributions. Power was computed by considering the number of times the test rejected the null hypothesis over the total number of replications. A numerical bootstrap study on real data was conducted to assess the robustness and applicability of the proposed tests. However, it was necessary to first assess the type I error control of our proposed tests before the power study.

3.1. Type I Error Control

Here, we provide the values of type I error rates along with the associated standard errors of the proposed tests for , and 0.10. In order to compute these quantities, for each nominal alpha, we generated 500,000 random samples from a standard normal distribution, each corresponding to sample size , and 100. The results presented in Table 3 show that the proposed tests control type I error very well. Figure 2 includes plots of the simulated type error rates only for for all the sample sizes considered. The plots for the empirical cumulative probability function of the simulated values for , , and were omitted since their plots were more or less the same as those for other sample sizes and  = 0.05, respectively. It is evident that the plots produced the expected appearance in all the simulated scenarios. That is, the plots show close to the -level of simulated type I error rates. The closeness of the estimated probabilities of type I error to the nominal value attests that the GoF test does perform as expected. These results were extended in order to evaluate the type I error control when simulating from a normal distribution with varying parameters of and . We considered various scenarios which include , , , , and for samples of sizes at alpha levels of , and 0.10 (see Table 4). Similarly, as observed in Table 3, the estimated probabilities of type I error were close to the respective nominal values which shows that the GoF test does perform as expected. It is important to note that various alternative methods that can also be used to assess the closeness of the simulated type I error rates to the nominal size alpha are available in the literature. The most popular one is based on the central limit theorem and it was described in detail by Batsidis et al. [41]. Once the type I error rates were examined, we then proceeded to evaluate the powers of the proposed tests to determine how well they would detect departures from normality and to see their power performance compared to those of the selected competing tests.

3.2. Monte Carlo Power Simulation Results

Results for the Monte Carlo power comparisons are presented. Bold numbers in all tables represent the two most superior tests under the respective simulated scenarios. From Tables 5 and 6, we found out that when the alternative distributions are short-tailed and symmetric, our proposed tests performed quite well. Under these symmetric alternative cases, our proposed tests (SRELR and CSELR) significantly outperformed all other studied tests. Tests based on LL, JB, RJB, and have the least power as compared to other tests. In general, the tests based on SRELR, CSELR, DP, SW, and are the most powerful under these symmetric short-tailed alternative distributions.

For symmetric long-tailed alternatives (see Tables 7 and 8), the tests based on RJB, , SF, and JB are more superior, whereas the tests based on , LL, and SEELR are the least powerful. Our proposed tests performed slightly lesser than the DP test but were comparable to the SW test. It is important to note that, in all of the cases under these symmetric long-tailed alternatives, our proposed tests outperformed all the EDF-based tests.

For the considered asymmetric alternatives (see Table 9), the tests based on SEELR, SW, SF, and AD are more superior, whereas the tests based on , RJB, and SRELR are the least powerful. Our proposed test based on CSELR performed slightly lesser than the JB test but was comparable to the LL test. It is important to note that, in all of the cases under these asymmetric alternatives, our proposed test based on CSELR outperformed the SRELR-based test.

In order to get a clearer visualisation of the performance of the different normality tests, the ranking procedure was used. Tables 10 to 12 contain the ranking of all the tests considered in this study according to the average powers computed from the values in Tables 58 and 9, respectively. The rank of power is based on the set of alternative distributions and sample sizes, respectively. Using average powers, we can select the tests that are, on average, most powerful against the alternatives from the given sets of alternatives. It should be noted that, under all the symmetric simulated scenarios, our proposed tests (SRELR and CSELR) were comparable in power.

From Table 10, it can be clearly seen that our proposed tests (SRELR and CSELR) are the most powerful tests for both small and moderate sample sizes under symmetric short-tailed alternatives. This is followed rather closely by the DP test. The results of the total rank based on all sample sizes (i.e., to 100) show that our proposed tests (SRELR and CSELR) are overly the most superior tests for symmetric short-tailed distributions.

For symmetric long-tailed alternatives (see Table 11), generally the RJB test was the most powerful in both small and moderate sample sizes. Our proposed tests had comparable power with the AD test under small samples for symmetric long-tailed alternatives. However, under moderate sample sizes, our proposed tests were slightly more powerful than the DP and SW tests. Lastly, considering all the sample sizes under symmetric long-tailed alternatives, our proposed tests were comparable to the SW test.

Lastly, under asymmetric alternatives (see Table 12), our proposed test based on CSELR performed better than the SRELR-based test. It is also important to note that our related test, the SEELR, outperformed all other tests under these considered asymmetric alternatives. It is also important to note that, unlike some of the competitor tests, our proposed tests were consistent in power under all alternative distributions for all simulated scenarios.

4. Real Data Study

We used the snowfall dataset to examine the applicability of the proposed test on real data. The snowfall dataset consists of 63 snow precipitation values that were recorded from the year 1910 to 1972. The dataset has been extensively used in various statistical applications; see, for example, Thaler [42], Carmichael [43], Tukey [44], and Parzen [45] to illustrate and compare various statistical techniques. The snowfall dataset is presented as follows: 126.4, 82.4, 78.1, 51.1, 90.9, 76.2, 104.5, 87.4, 110.5, 25.0, 69.3, 53.5, 39.8, 63.6, 46.7, 72.9, 79.6, 83.6, 80.7, 60.3, 79.0, 74.4, 49.6, 54.7, 71.8, 49.1, 103.9, 51.6, 82.4, 83.6, 77.8, 79.3, 89.6, 85.5, 58.0, 120.7, 110.5, 65.4, 39.9, 40.1, 88.7, 71.4, 83.0, 55.9, 89.9, 84.8, 105.2, 113.7, 124.7, 114.5, 115.6, 102.4, 101.4, 89.8, 71.5, 70.9, 98.3, 55.5, 66.1, 78.4, 102.5, 97.0, 110.0.

The snowfall data is well known to be consistent with the normal distribution. We plotted a histogram and a Q-Q plot in order to examine the hypothesis for the normality of the snowfall data (see Figure 3).

From the plots, it is clearly visible that the snowfall data are consistent with a normal distribution. Following the ideas introduced by Stigler [46], we conducted a bootstrap type study to empirically examine the proposed test based on the two statistics, CSELR and SRELR. The approach was to use a sample of size 60 by randomly selecting from the snowfall data and then test for normality at 0.05 level of significance. We repeated this strategy 10000 times, and the bootstrap type procedure showed that the proposed CSELR test had a value of 0.7755, while the SRELR had a value of 0.1451. In order to further examine the normality of the snowfall data, we repeated the bootstrap type study using the AD, CVM, JB, and SW tests. The values that were obtained, that is, 0.6862 for the AD test, 0.6921 for the CVM test, 0.5702 for the JB test, and 0.6650 for the SW test were all suggestive for one to conclude that the snowfall data are indeed normally distributed. Thus, the values obtained from the traditional tests as well as our proposed tests show to be reliable in illustrating the normality of the snowfall data. Thus, our proposed test statistics have demonstrated that they are indeed applicable when applied on some real-life data.

5. Conclusion

By utilizing the EL methodology and exploiting the mathematical properties and different forms of transforming the normal distribution, we have developed simple and powerful tests for normality against symmetric alternatives. The proposed tests are consistent and control type I error very well, which is consistent with what has been reported in other studies which looked at EL-based GoF tests (see, for example, [8, 12, 18]). They outperformed other common traditional tests under symmetric short-tailed alternatives. The proposed tests also performed quite well under symmetric long-tailed alternatives where they were found to be comparable to the SW test and outperformed all the considered EDF tests. The application of our proposed tests on real data revealed the applicability as well as the robustness of the proposed tests in practice. It would be desirable to develop an ELR-based test for normality that outperforms the classical tests under most alternative distributions that occur in practice. This might be the case after certain modifications and improvements that include further exploring the EL methodology as well as other forms of characterizing the normal distribution. The researchers are currently looking at exploiting the use of EDF in developing an empirical likelihood moment-based EDF test for normality. Thus, combining the characterization of EDF-based tests and EL omnibus tests can potentially improve power under small to moderate sample sizes.

Appendix

#Required packageslibrary(emplik)library(zipfR)#Generate standardized datagenedata < −function(n){#Generate datas < −runif(n, 0, 1)x1 < −sfor(k in 1 : n){s[k] < −(x1[k] − mean(x1))  sqrt(n/(n −1))/(sd(x1[−k]))}return(s)}#Generate transformed datagenedata1 < −function(n) {< −genedata(n)x < −pnorm(, 0, 1)return(x)}#Moment function for uniform distributionmomentFU < −function(k, a, b){z < −(b ^ (k + 1) −a ^ (k + 1))/((k + 1)  (b − a))}#Compute test statisticteststatistic < −function(x){#CUSUM-type statistick3 = el.test(x 3, m3)$2LLRk5 = el.test(x 5, m5)$2LLRreturn(max(k3, k5))#Shiryaev–Roberts statistic#k3 = exp(el.test(x 3, m3)$2LLR#k5 = exp(el.test(x 5, m5)$2LLR#return(sum(k3, k5))}n < −50 #sample sizea < −0 #lower limit for moment functionb < −1 #upper limit for moment function#Critical values#Critical values for CUSUM statisticCriticalValue < −0.2685595 #n = 50###########################################Critical values for SR statistic#CriticalValue < −2.459526 #n = 50MC < −10000 #number of replicationspower < − c()m3 = momentFU(3, a, b)m5 = momentFU(5, a, b)for(i in 1 : MC) {x < −genedata1(n)power[i] < −teststatistic(x) }# Power for the test under alternativePowerELR = (mean(power > CriticalValue))

Data Availability

The data used to demonstrate the applicability of our proposed tests in practice are presented in this article and can also be obtained from respective authors cited in the “Real Data Study” section. All other data were simulated using R and the source code is available in the Appendix.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would wish to extend their gratitude to Professor Albert Vexler for his insightful comments on ResearchGate. This study was funded through the research incentive funds received by the main author from the Department of Higher Education and Training (DHET) of South Africa.