Journal of Probability and Statistics

Volume 2018, Article ID 8094146, 10 pages

https://doi.org/10.1155/2018/8094146

## A Simple Empirical Likelihood Ratio Test for Normality Based on the Moment Constraints of a Half-Normal Distribution

Department of Statistics, Faculty of Science and Agriculture, Fort Hare University, East London Campus, 5200, South Africa

Correspondence should be addressed to C. S. Marange; az.ca.hfu@egnaramc

Received 12 May 2018; Revised 5 July 2018; Accepted 26 July 2018; Published 12 September 2018

Academic Editor: Elio Chiodo

Copyright © 2018 C. S. Marange and Y. Qin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A simple and efficient empirical likelihood ratio (ELR) test for normality based on moment constraints of the half-normal distribution was developed. The proposed test can also be easily modified to test for departures from half-normality and is relatively simple to implement in various statistical packages with no ordering of observations required. Using Monte Carlo simulations, our test proved to be superior to other well-known existing goodness-of-fit (GoF) tests considered under symmetric alternative distributions for small to moderate sample sizes. A real data example revealed the robustness and applicability of the proposed test as well as its superiority in power over other common existing tests studied.

#### 1. Introduction

Testing for distributional assumptions for normality is of paramount importance in applied statistical modelling. Several well-known numerical tests for normality are widely used by investigators to supplement the graphical techniques in assessing departures from normality. Amongst others, these tests include the Kolmogorov-Smirnov (KS) test [1], the Lilliefors (LL) test [2], the Anderson-Darling (AD) test [3, 4], the Shapiro-Wilks (SW) test [5], the Jarque-Bera (JB) test [6], and the DAgostino and Pearson (DP) test [7]. These tests differ on certain characteristics of the normal distribution on which they focus. That is, some focus on the empirical distribution function (EDF), some are moment based, and some are based on regression as well as correlation. Of these tests, some use normalized sample data whilst some use observed values. However, though these tests are commonly used in practice they do have major drawbacks. For example, some of these tests require complete specification of the null distribution, some require computation of critical values to be done for each specified null distribution, and some require ordering of the sample data when computing the test statistic. Generally, most of these tests are not supported when certain combinations of parameters of a specified distribution are estimated.

Of these, the most well-known goodness-of-fit (GoF) test is the SW test but it was originally restricted to small sample sizes (i.e., ). Several modifications have been proposed by several researchers. These include Royston [8] who suggested a normalized transformation for the test in order to resolve the limitations on the sample size, Shapiro and Francia [9] who also modified the test so that it can be ideal for large sample sizes, Chen and Shapiro [10] who proposed normalized spacings for an alternative test of the SW test, and Rahman and Govindarajulu [11] who defined new weights for the SW test statistic. However, the major drawback of the SW test is computation time in dealing with large samples when computing the covariance matrix that corresponds to order statistics of the vector of weights and the standard normal distribution.

However, we also have GoF tests that are based on moment constraints such as the skewness and kurtosis coefficients and these are well known to be efficient tools for evaluating normality. These moment based tests include the skewness test, the kurtosis test, the DP test, and the JB test. These tests combine moment constraints to check for deviations from normality. They are often referred to as omnibus tests because of their ability to detect departures from normality whilst not depending upon the parameters of the normal distribution. The adoption of the use of moment based tests coupled with the empirical likelihood methodology has recently attracted the attention of researchers in developing GoF tests for normality [12, 13]. Dong and Giles [12] proposed an empirical likelihood ratio (ELR) test utilizing the empirical likelihood (EL) methodology of Owen [14]. They monitored the first four moment conditions of the normal distribution and their test outperformed alternate common existing tests studied against several alternative distributions. Our study followed from the works of Shan et al. [13] who proposed a simple ELR test for normality based on moment constraints using a standardized normal variable. Their test proved to be more powerful than other well-known GoF tests on small to moderate sample sizes for several alternative distributions. In this study we adopted their approach and focused on the construction of a simple ELR test for normality using the moment constraints of the half-normal distribution. The next section will outline the development of our proposed test followed by Monte Carlo simulations. A real data example will be presented. Discussions and conclusion of the findings as well as potential areas of future research will be highlighted.

#### 2. ELR Test Development

Let us assume we have independent and identically distributed nonordered random variables . The intention being to assess whether the observed data is normally distributed. Thus we intend testing the following null hypothesis:where and are considered to be unknown parameters. We proposed using the standardized random variables of the normal distribution by using the following transformations: where and is the standard deviation to be estimated by an unbiased quantity . One can also decide to use the maximum likelihood estimate (MLE) , where and . Both quantities and are known to converge to as approaches . We also used an alternative transformation following Lin and Mudholkar’s [17] work which also eliminates the dependency that exists between and on the data distribution. Thus we also transformed our observations using where , , and . As gets large the standardized data points become asymptotically independent. If , then the absolute value . It also follows that if , then the modulus of the standardized normal random variables, and , follows a standardized half-normal random variable with mean = and variance = 1. The standardized form of the half-normal distribution is also known as the -distribution with . The standardized half-normal random variable has a PDF that is given byand we denote it as . Following Prudnikov et al. [18], the moment of the standardized half-normal variable for some integer is as outlined in the proposition below.

Proposition 1. *Let , for k = 1, 2,..., n, and then the moments are given bywhere denotes the gamma function.*

We then derived the first four moments using the function given in (5). These moments are easily obtained as follows.

Corollary 2. *Let . The first two moments of , that is and are given by*

Corollary 3. *Let . The skewness and kurtosis coefficients of are given by*

In this study we used the first four moment constraints of the standardized half-normal distribution.

##### 2.1. The ELR Based Test Statistic

We used an empirical likelihood ratio test (ELR) to construct our test statistic. Our aim was to compare the GoF test under against the alternative (). In order to achieve this, we constructed our test statistic as follows. Let us consider nonordered observations that are independent and identically distributed and assumed to have unknown and . The intention is to perform a GoF test for the distributional assumption that are consistent with a normal distribution. Now consider that the random variables are absolute standardized normal variables from the random variables . Thus the transformed/standardized observations have a moment function given in Proposition 1 above. Following the EL methodology we assigned , which is a probability parameter to each transformed observation , and then formulated the EL function that is given bywhere ’*s* satisfy the fundamental properties of probability; that is and . Probability parameters, ’*s*, will then be chosen subject to unbiased moment conditions and the EL method will utilize these ’*s* in order to maximize the EL function. Following this EL technique, has sample moments and the probability parameters (’*s*) are elements of the EL function. Under , the four unbiased empirical moment equations have the formThe composite hypotheses for the ELR test are given byAlternatively considering the above unbiased empirical moment equations, the hypotheses for the ELR test can be written asThe nonparametric empirical likelihood function corresponding to the given hypotheses has the form:where the unknown probability parameters and ’*s* are attained under and . Under the EL function is maximized with respect to the ’*s* subject to two constraintsFollowing this, the weights of ’*s* are identified aswhere , for . If we then use the Lagrangian multipliers technique, it can be shown that the maximum EL function under can be expressed by the given form:where is a root ofUnder the alternative hypothesis, is not required to identify the weights, , in order to maximize the EL function but only . Thus under the nonparametric EL function is given byNow let us consider to be -2 log likelihood test statistic for the hypotheses . It should be noted that, under , minus two times the log ELR has an asmymptotic limiting distribution [19]. Thus considering the null and alternative hypotheses, the above test statistic will simply be transformed toWith simple substitution the above can be simplified toWe used the likelihood ratio to compare to size adjusted critical values in order to decide whether or not to reject . We then proposed to reject the null hypothesis ifwhere is the test threshold and is percentile of the distribution whilst are integer values representing the set of moment constraints that maximizes the test statistic. As recommended by Dong and Giles [12], we used the first four moment constraints; that is, we set . In this study we used the abbreviation to refer to the first test where we transformed data using (2) and we used the abbreviation to refer to the second alternative test where we transformed data using (3). Our test statistic is a CUSUM-type statistic as classified by Vexler and Wu [20]. In their article, Vexler and Wu [20] stated that based on the change point literature, another common alternative is to utilize the Shiryaev-Roberts (SR) statistic in replacement of the CUSUM-type statistic (see, for example, [21, 22]). In our case the classical SR statistic was of the form . Vexler, Liu, and Pollak [23] showed that the classical SR statistic and the simple CUSUM-type statistic have almost equivalent optimal statistical properties due to their common null-martingale basis. Moreover, the classical SR statistic is adapted from the CUSUM-type statistic.

Shan et al. [13] used Monte Carlo experiments to compare the CUSUM-type statistic for their ELR test for normality with an equivalent classical SR statistic and based on the relative simplicity of the CUSUM-type statistic, as well as its power properties, the authors opted to use the CUSUM-type statistic for their study. We conducted a numerical experiment to compare power for the CUSUM-type and SR statistic for our proposed test statistics with increased moment constraints and, based on the same reasons given by Shan et al. [13], we decided to use the CUSUM-type statistic for our Monte Carlo comparisons. Also, from the results, outperformed , hence was our preferred test. For all further comparisons, was excluded in this study. Findings for this Monte Carlo experiment are presented in Table 4. However, it should be noted from these findings that has the potential to be superior to under certain alternatives. Further investigations to uncover the alternatives in which is superior to are a potential area of future research which will not be further addressed in this study. The next section will outline the Monte Carlo simulation procedures using the R statistical package.

#### 3. Monte Carlo Simulation Study

We used the R statistical package to implement our Monte Carlo simulation procedures in power comparisons as well as assessment of our preferred proposed test (). It should be noted that other standard statistical packages can easily be used to implement our proposed tests. In order for us to conduct any assessments and evaluations of the proposed test, firstly we had to determine the size adjusted critical values.

##### 3.1. Size Adjusted Critical Values

Since the proposed ELR test is an asymptotic test, we therefore computed the unknown actual sizes for finite samples using Monte Carlo simulations with 50,000 replications. Motivated by practical applications, we considered critical values for relatively small sample sizes, i.e., because most applied statistical sciences datasets fall within this range. The actual rejection rate for a given sample size is considered to be the total number of the rejections divided by the total number of replications. Data was simulated from a standard normal distribution. The stored ordered test statistics were then used to determine the percentiles of the empirical distribution. This makes it possible to obtain the , size adjusted critical values.

##### 3.2. ELR Test Assessment

The power of the proposed test () was compared to that of common existing GoF tests that include the Anderson-Darling (AD) test [3, 4] test, the modified Kolmogorov-Smirnov (KS) test [2] the Cramer-von Mises (CVM) test [24–26], the Jarque-Bera (JB) test [6], the Shapiro-Wilk (SW) test [5], the density based empirical likelihood ratio based (DB) test [16], and the simple and exact empirical likelihood test based on moment relations (SEELR) [13] at the 5% significance level. Power simulations were done using 5,000 replications for all tests with varying sample sizes ( = 20, 30, 50 and 80) against different alternative distributions. We adopted alternative distributions used by Shan et al. [13] which covers a wide range of both symmetric and asymmetric applied distributions. To assess robustness and applicability of our proposed test (), we conducted a bootstrap study using some real data.

#### 4. Results of the Monte Carlo Simulations

This section presents the findings of the power comparisons for the different categories of the alternative distributions considered. The results of the power comparisons are presented in Tables 5–8. Under symmetric cases defined on our new test outperformed all other studied tests against the considered alternative distributions but slightly inferior to the JB test. For symmetric distributions defined on our proposed test () was comparable to the DB test and significantly outperformed other alternate tests studied. However, when the alternative is Beta (0.5, 0.5), the test is comparable to the SW and SEELR tests whilst only outperforming the KS test, the CVM test and the JB test.

As for asymmetric distributions defined on , the SW and SEELR are the most powerful tests and should be the preferred tests under these cases. The AD and DB tests are comparable and they performed better than the proposed test as well as the KS and CVM tests. Lastly, in the category of asymmetric alternative distributions defined on the test was comparable to the SEELR test at low sample sizes (i.e., ) for the non-central -distributions. The SW test outperformed all the tests considered in this study under these asymmetric alternative distributions. For the ELR based tests only the SEELR test was comparable to the common existing tests studied, that is, the AD test, the test, the CVM test, and the JB test.

Overall, when considering all the normality tests with respect to all of the alternative distributions considered, it can be seen that, the JB, the and the SW tests are generally the most powerful tests given symmetric alternatives defined on , whilst the DB and the tests are the most powerful tests for symmetric alternatives defined on . On the other hand, the SEELR and the SW tests are the most powerful tests for asymmetric alternatives defined on , whereas, the JB and SW tests are the most powerful tests for asymmetric alternatives defined on .

It was of paramount importance for us to determine the computational cost of the new algorithms by focusing on the computation time of the proposed test as compared to that of the considered existing tests. To assess this, we used the R benchmark tool on a notebook installed with 64 Bit Windows 10 Home addition. Equipped with a 4th generation Intel Core i5-4210U processor which has a speed of 1.7 GHz cache and memory (RAM) of 4 GB PC3 DDR3L SDRAM, we set our simulations to 5,000 for each test with sample size set at . The results (see Table 1) show only a clear advantage of our proposed approach to that of the widely known JB test. Also from the results, our proposed methods are comparable to the SEELR test but inferior to the DB test. The SW, CVM, KS and AD tests are computationally more efficient in terms of time than the rest of the studied tests.