Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014, Article ID 746451, 27 pages
http://dx.doi.org/10.1155/2014/746451
Research Article

Comparative Performance of Four Single Extreme Outlier Discordancy Tests from Monte Carlo Simulations

1Departamento de Sistemas Energéticos, Instituto de Energías Renovables, Universidad Nacional Autónoma de México, 62580 Temixco, MOR, Mexico
2Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, MOR, Mexico
3Posgrado en Ciencias, Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, MOR, Mexico
4Departamento de Computación, Instituto de Energías Renovables, Universidad Nacional Autónoma de México, 62580 Temixco, MOR, Mexico

Received 18 October 2013; Accepted 5 December 2013; Published 11 March 2014

Academic Editors: J. Pacheco and Z. Wu

Copyright © 2014 Surendra P. Verma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Using highly precise and accurate Monte Carlo simulations of 20,000,000 replications and 102 independent simulation experiments with extremely low simulation errors and total uncertainties, we evaluated the performance of four single outlier discordancy tests (Grubbs test N2, Dixon test N8, skewness test N14, and kurtosis test N15) for normal samples of sizes 5 to 20. Statistical contaminations of a single observation resulting from parameters called from ±0.1 up to ±20 for modeling the slippage of central tendency or from ±1.1 up to ±200 for slippage of dispersion, as well as no contamination ( and ), were simulated. Because of the use of precise and accurate random and normally distributed simulated data, very large replications, and a large number of independent experiments, this paper presents a novel approach for precise and accurate estimations of power functions of four popular discordancy tests and, therefore, should not be considered as a simple simulation exercise unrelated to probability and statistics. From both criteria of the Power of Test proposed by Hayes and Kinsella and the Test Performance Criterion of Barnett and Lewis, Dixon test N8 performs less well than the other three tests. The overall performance of these four tests could be summarized as .

1. Introduction

As summarized by Barnett and Lewis [1], a large number of discordancy tests are available for determining an outlier as an extreme (i.e., legitimate) or a discordant (i.e., contaminant) observation in normal samples at a given confidence or significance level. These discordancy tests are likely to be characterized by different power or performance. Numerous researchers [16] have commented on the properties of these tests under the slippage of location or central tendency and slippage of scale or dispersion by one or more observations, but very few studies have been reported on the use of Monte Carlo simulation for precise and accurate performance measures of these tests. Relatively more recently using Monte Carlo simulation of replications or runs, Hayes and Kinsella [7] evaluated the performance criteria of two discordancy tests (Grubbs single outlier test N2 and Grubbs multiple outlier test N4k2; the nomenclature is after Barnett and Lewis [1]) and discussed their spurious and nonspurious components of type II error and power function. However, four single extreme outlier type discordancy tests, also called two-sided discordancy tests by Barnett and Lewis [1], are available, which are Grubbs type N2, Dixon type N8, skewness N14, and kurtosis N15. Their relative performance measures should be useful for choosing among the different tests for specific applications.

Monte Carlo simulation methods have been extensively used in numerous simulation studies [818]. Some of the relatively recent papers are by Efstathiou [12], Gottardo et al. [13], Khedhiri and Montasser [14], P. A. Patel and J. S. Patel [15], Noughabi and Arghami [16], Krishnamoorthy and Lian [17], and Verma [18]. For example, Noughabi and Arghami [16] compared seven normality tests (Kolmogorov-Smirnov, Anderson-Darling, Kuiper, Jarque-Bera, Cramer von Mises, Shapiro-Wilk, and Vasicek) for sample sizes of 10, 20, 30, and 50 and under different circumstances recommended the use of Jarque-Bera, Anderson-Darling, Shapiro-Wilk, and Vasicek tests.

We used Monte Carlo simulations to evaluate comparative efficiency of four extreme outlier type discordancy tests (N2, N8, N14, and N15, the nomenclature after Barnett and Lewis [1]) for sample sizes of 5 to 20. Our approach to the statistical problem of test performance is novel because, instead of using commercial or freely available software, we programmed and generated extremely precise and accurate random numbers and normally distributed data, used very large replications of 20,000,000, performed 102 independent experiments, and reduced the simulation errors to such an extent that the differences in test performance are far greater than the total uncertainties expressed as 99% confidence intervals of the mean. This is an approach hitherto practiced by none (see, e.g., [818]) except by our group [1923]. This work, therefore, supersedes the approximate simulation results of test performance reported by the statisticians Hayes and Kinsella [7].

2. Discordancy Tests

For a data array or an ordered array of observations, with mean and standard deviation , four test statistics were objectively evaluated in this work. For a statistically contaminated sample of size of 5 to 20, observations of this data array were obtained from a normal distribution and the remaining observation was taken from a central tendency shifted distribution or dispersion shifted distribution , where the contaminant parameters for modeling slippage of central tendency and for slippage of dispersion can be either positive or negative. For an uncontaminated sample, the simulations were done for and . In order to achieve an unbiased comparison, the application of the tests was always forced to the upper outlier for positive values of or and to the lower outlier for negative values of or .

Thus, the first test was the Grubbs test N2 [24] for an extreme outlier or , for which the test statistic is as follows:

The second test was the Dixon test N8 [2] as follows:

The third test was sample skewness N14 as (note that the absolute value is used for evaluation):

Finally, the fourth test was the sample kurtosis N15 as follows:

All tests were applied at a strict 99% confidence level using the new precise and accurate critical values (CV99) simulated using Monte Carlo procedure by Verma et al. [19] for N2, N8, and N15 and Verma and Quiroz-Ruiz [20] for N14, which permitted an objective comparison of their performance.

3. Monte Carlo Simulations

Random numbers uniformly distributed in the interval and normal random variates were generated from the method summarized by Verma and Quiroz-Ruiz [21]. However, instead of only 10 series or streams of as done by Verma and Quiroz-Ruiz [21], a total of 102 different streams of were simulated. Similarly, the replications were much more than those used by Verma and Quiroz-Ruiz [21] for generating precise and accurate critical values.

For a data array of size , observations were drawn from one stream of and the contaminant observation () was added from a different central tendency shifted stream of where was varied from and or a dispersion shifted distribution where was varied from and . The simulation experiments were also carried out for uncontaminated distributions, in which () observations were taken from one stream of normal random variates and an additional observation was incorporated from a different stream with no contamination; that is, and .

Now, if we were to arrange the complete array from the lowest to the highest observations, the ordered array could be called after Barnett and Lewis [1]. All four tests under evaluation could then be applied to the resulting data array.

If , , , or (the contaminant present), two possibilities would arise for the ordered array as follows (Table 1): (i) the contaminant occupies an inner position in the ordered array; that is, if or or if , or ; this array is called a type event and the contaminant was not used in the test statistic; and (ii) the contaminant occupies the extreme position; that is, if or , or if , or ; this array was called a type event and the contaminant was used in the test statistic.

tab1
Table 1: Sample simulation and test outcome (modified after Hayes and Kinsella [7]).

To an event of type when any of these four tests (N2, N8, N14, or N15) was applied, the outcome was called either a spurious type II error probability () if the test was not significant or a spurious power () if it was significant (Table 1). For this decision, the calculated test statistic TN (TN2, TN8, TN14, or TN15) for a sample was compared with the respective CV99 [19, 20]. If TN ≤ CV99, the outcome of the test was considered as not significant; else when TN > CV99, the outcome of the test was considered as significant (Table 1).

Similarly, to an event of type, when a discordancy test was applied, the outcome was either a nonspurious type II error probability () if the test was not significant or a nonspurious power () if the test was significant (Table 1).

If or (the contaminant absent) and a discordancy test was applied to the ordered array to evaluate the extreme observation or , the outcome would either be a true negative (the respective probability ) if the test was not significant, that is, if it failed to detect or as discordant, or a type I error (probability ) if the test was significant; that is, it succeeded in detecting or as discordant (Table 1).

4. Test Performance Criteria

Hayes and Kinsella [7] documented that a good discordancy test would be characterized by a high nonspurious power probability (high ), a low spurious power probability (low ), and a low nonspurious type II error probability (low ).

Hayes and Kinsella [7] defined the Power of Test () as

Similarly, they also defined the Test Performance Criterion (which is equivalent to the probability P5 of Barnett and Lewis [1]) or the Conditional Power as

5. Optimum Replications

The optimum replications required for minimizing the errors of Monte Carlo simulations were decided from representative results summarized in Figures 1 and 2, in which the vertical error bar represents total uncertainty at 99% confidence level (, equivalent to 99% confidence interval of the mean) for 102 simulation experiments. For example, for and , Power of Test is plotted in Figures 1(a)1(d) as a function of the replications ( to 20,000,000) for N2, N8, N14, and N15. Although mean values remain practically constant (within the confidence limits of the mean) for replications of about 8,000,000, still higher replications of 20,000,000 (Figures 1 and 2) were used in all simulation experiments.

fig1
Figure 1: Determination of optimum simulation replication for Power of Test () as a function of replications for sample size and contaminant parameter ; symbols are explained in each figure; the vertical error bars represent uncertainty () at 99% confidence level from 102 simulations. (a) test N2; (b) test N8; (c) test N14; and (d) test N15.
fig2
Figure 2: Determination of optimum simulation replication for Power of Test () as a function of replications for all tests N2, N8, N14, and N15; symbols are explained in each figure. (a) Sample size and contaminant parameter ; (b) and ; (c) and ; and (d) and .

Similarly, for all four tests as a function of replications is also shown in Figures 2(a)2(d), which allows a visual comparison of this performance parameter for different sample sizes and values. Error bars () for the 102 simulation experiments are not shown for simplicity, but, for replications larger than 10,000,000, they were certainly within the size of the symbols. The replications of 20,000,000 routinely used for comparing the performance of discordancy tests clearly show that the differences among values (Figures 2(a)2(d)) are statistically significant at a high confidence level; that is, these differences are much greater than the simulation errors.

Alternatively, following Krishnamoorthy and Lian [17] the simulation error for the replications of 20,000,000 used routinely in our work can be estimated approximately as .

Because we carried out 102 independent simulation experiments, each with 20,000,000 replications, our simulation errors were even less than the above value. Thus, the Monte Carlo simulations can be considered highly precise. They can also be said to be highly accurate, because our procedure was modified after the highly precise and accurate method of Verma and Quiroz-Ruiz [21]. These authors had shown high precision and accuracy of each and experiments and had also applied all kinds of simulated data quality tests suggested by Law and Kelton [25]. Besides, in the present work a large number of such experiments (204 streams of and 102 streams of ) have been carried out. Therefore, as an innovation in Monte Carlo simulations we present the mean values as well as the total uncertainty () of 102 independent experiments in terms of the confidence interval of the mean at the strict 99% confidence level.

Finally, in order to evaluate the test performance, test N2 was used as a reference and differences in mean () values of the other three tests were calculated as where the subscript stands for N8, N14, or N15.

6. Results and Discussion

6.1. Type and Contaminant-Absent Events

According to Barnett and Lewis [1] this type of events is of no major concern, because the contaminant occupies an inner position in the ordered array and the extreme observation or under evaluation from discordancy tests is a legitimate observation. An inner position of the contaminant would affect much less the sample mean and standard deviation [1]. For small values of or close to 0 or ±1, respectively, most events generated from the Monte Carlo simulation are of type. The and values for to as a function of are presented in Figures 3(a)3(d) and Figures 4(a)4(d), respectively. For , these parameters behave very similarly and, therefore, the corresponding diagrams are not presented.

fig3
Figure 3: Spurious type II error probability () as a function of from −2.5 to +2.5 for all tests N2, N8, N14, and N15. values for uncontaminated samples () are shown by open circles. (a) ; (b) ; (c) ; and (d) .
fig4
Figure 4: Spurious power probability () as a function of from −2.5 to +2.5 for all tests N2, N8, N14, and N15. values for uncontaminated samples () are shown by open circles. (a) ; (b) ; (c) ; and (d) .

When the contaminant is absent ( or ), the and values are close to the expected values of 0.99 and 0.01, respectively, because the discordancy tests were applied at the 99% confidence level (open circles in Figures 3(a)3(d) and Figures 4(a)4(d)). As changes from 0 to about ±2.5, the values slightly increase from 0.99 to about 0.996 for (Figure 3(a)), 0.996 for (Figure 3(b)), 0.994-0.995 for (Figure 3(c)), and 0.993-0.994 for (Figure 3(d)). The values show the complementary behavior (Figures 4(a)4(d)). Because in this type of events , a legitimate extreme observation is being tested, our best desire is that the and values remain close to the theoretical values of 0.99 and 0.01, respectively, for contaminant-absent events. This is actually observed in Figures 3 and 4.

6.2. Type and Contaminant-Absent Events

The type events are of major consequence for sample statistical parameters. In such events, because the contaminant occupies an extreme outlying position ( or ) in an ordered data array, it is desirable that the discordancy tests detect this contaminant observation as discordant. The and values for to as a function of are presented in Figures 5(a)5(d) and Figures 6(a)6(d), respectively. Similarly, these values as a function of are shown in Figures 7(a)7(d) and Figures 8(a)8(d), respectively.

fig5
Figure 5: Nonspurious type II error probability () as a function of from −20 to +20 for all tests N2, N8, N14, and N15. values for uncontaminated samples () are shown by open circles. (a) ; (b) ; (c) ; and (d) .
fig6
Figure 6: Nonspurious power probability () as a function of from −20 to +20 for all tests N2, N8, N14, and N15. values for uncontaminated samples () are shown by open circles. (a) ; (b) ; (c) ; and (d) .
fig7
Figure 7: Nonspurious type II error probability () as a function of from −1 to −200 and +1 to +200 for all tests N2, N8, N14, and N15. values for uncontaminated samples () are shown by open circles. (a) ; (b) ; (c) ; and (d) .
fig8
Figure 8: Nonspurious power probability () as a function of from −1 to −200 and +1 to +200 for all tests N2, N8, N14, and N15. values for uncontaminated samples () are shown by open circles. (a) ; (b) ; (c) ; and (d) .

For uncontaminated samples ( in Figures 5(a)5(d) and Table 2, or in Figures 7(a)7(d)) the probability values were close to the theoretical value of 0.99 (which corresponds to the confidence level used for each test). Similarly, for such samples, values for all sample sizes were close to the theoretical value of 0.01 (complement of 0.99 is 0.01; Figures 6 and 8).

tab2
Table 2: NonSpurious type II error probability () parameter for four single extreme outlier discordancy tests.

A complementary behavior of and exists for all other or values as well (Figures 5 and 7 or Figures 6 and 8). Thus, for all tests decreases sharply from 0.99 for to very small values of about 0.03 for and , to about 0.01–0.03 for and , to about 0.006–0.02 for and , and to about 0.001–0.01 for and (Table 2; Figures 5(a)5(d)). On the contrary, increases very rapidly from very small values of 0.01 to close to the maximum theoretical value of 0.99 (see the complementary behavior in Figures 6(a)6(d) and Figures 5(a)5(d)). These probability ( and ) values show a similar behavior for larger values of than for (compare Figures 7 and 8 with Figures 5 and 6, resp.). There are some differences in these probability values among the different tests (Table 2; Figures 58), but they will be better discussed for the test performance criteria.

6.3. Test Performance Criteria ( and )

These two parameters are plotted as a function of and in Figures 9, 10, 11, and 12 and the most important results are summarized in Tables 36. For a good test, both (; (5)) and (6) should be large [1, 7]. Values of both performance criteria ( and ) increase as or values depart from the uncontaminated values of or (Figures 912; Tables 36). However, and increase less rapidly for smaller than for larger . For , even for or , none of the two parameters truly reaches the maximum theoretical value of 0.99 (Figure 9(a) to Figure 12(a)). For larger (10–20), however, both and get close to this value for all tests and for much smaller values of or than the maximum values of 20 and 200, respectively (Figures 9(b)9(d) to Figures 12(b)12(d); Tables 36).

tab3
Table 3: Power of Test () values for four single extreme outlier discordancy tests as a function of .
tab4
Table 4: Test Performance Criterion P5 () values for four single extreme outlier discordancy tests as a function of .
tab5
Table 5: Power of Test () values for four single extreme outlier discordancy tests as a function of .
tab6
Table 6: Test Performance Criterion P5 () values for four single extreme outlier discordancy tests as a function of .
fig9
Figure 9: Power of Test () as a function of from −20 to +20 for all tests N2, N8, N14, and N15. (a) ; (b) ; (c) ; and (d) .
fig10
Figure 10: Power of Test () as a function of from −1 to −200 and +1 to +200 for all tests N2, N8, N14, and N15. (a) ; (b) ; (c) ; and (d) .
fig11
Figure 11: Test Performance Criterion (, or Conditional Power P5) as a function of from −20 to +20 for all tests N2, N8, N14, and N15. (a) ; (b) ; (c) ; and (d) .
fig12
Figure 12: Test Performance Criterion (, or Conditional Power P5) as a function of from −1 to −200 and +1 to +200 for all tests N2, N8, N14, and N15. (a) ; (b) ; (c) ; and (d) .

The performance differences of the four tests are now briefly discussed in terms of both and as well as . The total uncertainty values of the simulations are extremely small (the error is at the fifth or even sixth decimal place; Tables 36). Therefore, most differences among the tests ( for test N8, for test N14, and for test N15; all percent differences are with respect to test N2; see (7)) are statistically significant (Tables 36). A negative value of (where stands for N8, N14, or N15) means that or value for a given test (N8, N14, or N15) is less than that of test N2, implying a worse performance of the given test as compared to test N2, whereas a positive value of signifies just the opposite. Note that test N2 is chosen as a reference test, because it shows generally the best performance (values of are mostly negative in Tables 36). Additional fine-scale simulations were also carried out for which both and become about 0.5 for the reference test N2 (0.5 is about the half of the maximum value of one for or ). Hence, the values of and can be visually compared in Tables 36 (see the rows in italic font).

For , all tests show rather similar performance, because the maximum difference () is only about −1.1% for N8 (as compared to N2) and <−0.1% for N14 and N15 (see the first set of rows corresponding to in Tables 36). Test N2 shows for , whereas tests N8, N14, and N15 have values of 0.49503, 0.50014, and 0.50015, respectively, (Table 3). The respective values are about −1.1%, −0.06%, and −0.06% (Table 3). Practically the same results are valid for as well (see the row in italic font in Table 4). Similar results were documented for and as a function of (rows for or in Tables 5 and 6, resp.).

For , Dixon test N8 becomes considerably less efficient than Grubbs test N2, because the values become as low as −7.8% for or −6.4% for (Tables 36). Skewness test N14 also shows slightly lower and than N2 (% for , or % for ; Tables 36). Kurtosis test N15 shows a similar performance as test N2; the maximum difference is about 0.7 (Tables 36). For test N2 shows (or ) for ; for this case, the other three tests (N8, N14, and N15) show values of about −7.8%, −1.8%, and −0.7% (Tables 3 and 4). Similarly, for such cases, and show , , and values of about −4.3%, −1.0%, and −0.4%, respectively.

For and , test N8 shows the worst performance and the values become as large as −12.2% to −15.5% for (Tables 3 and 4) or −9.8% to −11.5% for (Tables 5 and 6). For these sample sizes, test N14 also shows a worse performance as compared to N2, because the maximum differences represented by values are about −6.3% to −10.9% for (Tables 3 and 4) or −4.5% to −7.0% for (Tables 5 and 6). Test N15 shows a comparable performance, because the maximum differences ( values) are about −1.5% to −2.4% for (Tables 3 and 4) or −1.1% to −1.5% for (Tables 5 and 6). For and , when test N2 shows or , the , , and values range from about −6.9% to −15.0%, −3.0% to −9.2%, and −0.6% to −1.9%, respectively.

The significantly lower and values of the Dixon test N8 as compared to the Grubbs test N2, skewness test N14, and kurtosis test N15 may be related to the masking effect of the penultimate observation on or of on as documented by Barnett and Lewis [1]. The masking effect may also be responsible for a somewhat worse performance of N14 as compared to N2.

6.4. Final Remarks

The two performance criteria ( and ) [1, 7] used in this work provide similar estimates (Tables 36) and, more importantly, similar conclusions. Therefore, any of them can be used to evaluate numerous other discordancy tests for single or multiple outliers [1, 2628]. The main result of Monte Carlo simulations concerning the performance of the single extreme outlier discordancy tests could be stated as follows: .

Additional simulation work is required to evaluate other discordancy tests, such as the single upper or lower outlier tests, as well as more complex statistical contamination involving two or more discordant outliers and the comparison of consecutive application of single outlier discordancy tests with multiple outlier tests [1, 7, 2628]. Then, the multiple test method, initially proposed by Verma [29] and used by many researchers [3035], would be substantially improved for subsequent applications. These performance results could then be incorporated in new versions of the computer programs DODESSYS [36], TecD [37], and UDASYS [38].

7. Conclusions

Our simulation study clearly shows that Dixon test N8 performs less well than the other three extreme single outlier tests (Grubbs N2, skewness N14, and kurtosis N15). Both performance parameters (the Power of Test and Test Performance Criterion ) have up to about 16% less values for N8 than test N2. Test N8, therefore, shows the worst performance for outlier detection. For certain values of or test N14 also shows lesser values of and than N2, which means that N14 is also somewhat worse than N2. The other two tests (N2 and N15) could be considered comparable in their performance.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The computing facilities for this work were partly from the DGAPA-PAPIIT project IN104813. The second author (Lorena Díaz-González) acknowledges PROMEP support to the project “Estadística computacional para el tratamiento de datos experimentales” (PROMEP/103-5/10/7332). The third author (Mauricio Rosales-Rivera) thanks the Sistema Nacional de Investigadores (Mexico) for a scholarship that enabled him to participate in this research as Ayudantes de Investigador Nacional Nivel III of the first author (Surendra P. Verma).

References

  1. V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, UK, 3rd edition, 1994.
  2. W. J. Dixon, “Analysis of extreme values,” Annals of Mathematical Statistics, vol. 21, no. 4, pp. 488–506, 1950. View at Google Scholar
  3. T. S. Ferguson, “Rules for rejection of outliers,” Revue de l'Institut International de Statistique, vol. 29, no. 3, pp. 29–43, 1961. View at Google Scholar
  4. S. S. Shapiro, M. B. Wilk, and H. J. Chen, “A comparative study of various tests for normality,” Journal of American Statistical Association, vol. 63, no. 324, pp. 1343–1371, 1968. View at Google Scholar
  5. D. M. Hawkins, “Analysis of three tests for one or two outliers,” Statistica Nederlandica, vol. 32, no. 3, pp. 137–148, 1978. View at Google Scholar
  6. P. Prescott, “Examination of the behavior of tests for outliers when more than one outlier is present,” Applied Statistics, vol. 27, no. 1, pp. 10–25, 1978. View at Google Scholar
  7. K. Hayes and T. Kinsella, “Spurious and non-spurious power in performance criteria for tests of discordancy,” Journal of the Royal Statistical Society Series D, vol. 52, no. 1, pp. 69–82, 2003. View at Publisher · View at Google Scholar · View at Scopus
  8. G. L. Tietjen and R. H. Moore, “Some Grubbs-type statistics for the detection of several outliers,” Technometrics, vol. 14, no. 3, pp. 583–597, 1972. View at Google Scholar · View at Scopus
  9. B. Rosner, “On the detection of many outliers,” Technometrics, vol. 17, no. 2, pp. 221–227, 1975. View at Google Scholar · View at Scopus
  10. J. P. Royston, “The W test for normality,” Journal of the Royal Statistical Society C, vol. 31, no. 2, pp. 176–180, 1982. View at Google Scholar
  11. J. S. Simonoff, “The breakdown and influence properties of outlier rejection-plus-mean procedures,” Communications in Statistics, vol. 16, no. 6, pp. 1749–1760, 1987. View at Google Scholar
  12. C. E. Efstathiou, “Estimation of type I error probability from experimental Dixon's “Q” parameter on testing for outliers within small size data sets,” Talanta, vol. 69, no. 5, pp. 1068–1071, 2006. View at Publisher · View at Google Scholar · View at Scopus
  13. R. Gottardo, A. E. Raftery, K. Y. Yeung, and R. E. Bumgarner, “Quality control and robust estimation for cDNA microarrays with replicates,” Journal of the American Statistical Association, vol. 101, no. 473, pp. 30–40, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. S. Khedhiri and G. E. Montasser, “The effects of additive outliers on the seasonal KPSS test: a Monte Carlo analysis,” Journal of Statistical Computation and Simulation, vol. 80, no. 6, pp. 643–651, 2010. View at Publisher · View at Google Scholar · View at Scopus
  15. P. A. Patel and J. S. Patel, “A monte carlo comparison of some variance estimators of the Horvitz-Thompson estimator,” Journal of Statistical Computation and Simulation, vol. 80, no. 5, pp. 489–502, 2010. View at Publisher · View at Google Scholar · View at Scopus
  16. H. A. Noughabi and N. R. Arghami, “Monte carlo comparison of seven normality tests,” Journal of Statistical Computation and Simulation, vol. 81, no. 8, pp. 965–972, 2011. View at Publisher · View at Google Scholar · View at Scopus
  17. K. Krishnamoorthy and X. Lian, “Closed-form approximate tolerance intervals for some general linear models and comparison studies,” Journal of Statistical Computation and Simulation, vol. 82, no. 4, pp. 547–563, 2012. View at Publisher · View at Google Scholar · View at Scopus
  18. S. P. Verma, “Geochemometrics,” Revista Mexicana de Ciencias Geológicas, vol. 29, no. 1, pp. 276–298, 2012. View at Google Scholar
  19. S. P. Verma, A. Quiroz-Ruiz, and L. Díaz-González, “Critical values for 33 discordancy test variants for outliers in normal samples up to sizes 1000, and applications in quality control in Earth Sciences,” Revista Mexicana de Ciencias Geologicas, vol. 25, no. 1, pp. 82–96, 2008. View at Google Scholar · View at Scopus
  20. S. P. Verma and A. Quiroz-Ruiz, “Corrigendum to Critical values for 22 discordancy test variants for outliers in normal samples up to sizes 100, and applications in science and engineering [Revista Mexicana de Ciencias Geologicas, vol. 23, pp. 302–319, 2006],” Revista Mexicana de Ciencias Geologicas, vol. 28, no. 1, p. 202, 2011. View at Google Scholar
  21. S. P. Verma and A. Quiroz-Ruiz, “Critical values for six Dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering,” Revista Mexicana de Ciencias Geologicas, vol. 23, no. 2, pp. 133–161, 2006. View at Google Scholar · View at Scopus
  22. R. Cruz-Huicochea and S. P. Verma, “New critical values for F and their use in the ANOVA and Fisher’s F tests for evaluating geochemical reference material granite G-2 (U.S.A.) and igneous rocks from the Eastern Alkaline Province (Mexico),” Journal of Iberian Geology, vol. 39, no. 1, pp. 13–30, 2013. View at Google Scholar
  23. S. P. Verma and R. Cruz-Huicochea, “Alternative approach for precise and accurate Student's t critical values in geosciences and application in geosciences,” Journal of Iberian Geology, vol. 39, no. 1, pp. 31–56, 2013. View at Google Scholar
  24. F. E. Grubbs, “Procedures for detecting outlying observations in samples,” Technometrics, vol. 11, no. 1, pp. 1–21, 1969. View at Google Scholar
  25. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw Hill, Boston, Mass, USA, 3rd edition, 2000.
  26. S. P. Verma, L. Díaz-González, and R. González-Ramírez, “Relative efficiency of single-outlier discordancy tests for processing geochemical data on reference materials and application to instrumental calibrations by a weighted least-squares linear regression model,” Geostandards and Geoanalytical Research, vol. 33, no. 1, pp. 29–49, 2009. View at Publisher · View at Google Scholar · View at Scopus
  27. S. P. Verma, Estadística Básica para el Manejo de Datos Experimentales: Aplicación en la Geoquímica (Geoquimiometría), UNAM, México, DF, 2005.
  28. R. González-Ramírez, L. Díaz-González, and S. P. Verma, “Eficiencia relativa de 15 pruebas de discordancia con 33 variantes aplicadas al procesamiento de datos geoquímicos,” Revista Mexicana de Ciencias Geologicas, vol. 26, no. 2, pp. 501–515, 2009. View at Google Scholar · View at Scopus
  29. S. P. Verma, “Sixteen statistical tests for outlier detection and rejection in evaluation of international geochemical reference materials: example of microgabbro PM-S,” Geostandards Newsletter, vol. 21, no. 1, pp. 59–75, 1997. View at Publisher · View at Google Scholar · View at Scopus
  30. E. Gómez-Arias, J. Andaverde, E. Santoyo, and G. Urquiza, “Determinación de la viscosidad y su incertidumbre en fluidos de perforación usados en la construcción de pozos geotérmicos: aplicación en el campo de Los Humeros, Puebla, México,” Revista Mexicana de Ciencias Geológicas, vol. 26, no. 2, pp. 516–529, 2009. View at Google Scholar
  31. S. G. Marroquín-Guerra, F. Velasco-Tapia, and L. Díaz-González, “Evaluación estadística de materiales de referencia geoquímica del Centre de Recherches Pétrographiques et Géochimiques (Francia) aplicando un esquema de detección y eliminación de valores desviados,” Revista Mexicana de Ciencias Geológicas, vol. 26, no. 2, pp. 530–542, 2009. View at Google Scholar
  32. K. Pandarinath, “Evaluation of geochemical sedimentary reference materials of the Geological Survey of Japan (GSJ) by an objective outlier rejection statistical method,” Revista Mexicana de Ciencias Geologicas, vol. 26, no. 3, pp. 638–646, 2009. View at Google Scholar · View at Scopus
  33. K. Pandarinath, “Clay minerals in SW Indian continental shelf sediment cores as indicators of provenance and palaeomonsoonal conditions: a statistical approach,” International Geology Review, vol. 51, no. 2, pp. 145–165, 2009. View at Publisher · View at Google Scholar · View at Scopus
  34. K. Pandarinath and S. K. Verma, “Application of four sets of tectonomagmatic discriminant function based diagrams to basic rocks from northwest Mexico,” Journal of Iberian Geology, vol. 39, no. 1, pp. 181–195, 2013. View at Google Scholar
  35. M. P. Verma, “IAEA inter-laboratory comparisons of geothermal water chemistry: critiques on analytical uncertainty, accuracy, and geothermal reservoir modelling of Los Azufres, Mexico,” Journal of Iberian Geology, vol. 39, no. 1, pp. 57–72, 2013. View at Google Scholar
  36. S. P. Verma and L. Díaz-González, “Application of the discordant outlier detection and separation system in the geosciences,” International Geology Review, vol. 54, no. 5, pp. 593–614, 2012. View at Publisher · View at Google Scholar · View at Scopus
  37. S. P. Verma and M. A. Rivera-Gómez, “New computer program TecD for tectonomagmatic discrimination from discriminant function diagrams for basic and ultrabasic magmas and its application to ancient rocks,” Journal of Iberian Geology, vol. 39, no. 1, pp. 167–179, 2013. View at Google Scholar
  38. S. P. Verma, R. Cruz-Huicochea, and L. Díaz-González, “Univariate data analysis system: deciphering mean compositions of island and continental arc magmas, and influence of underlying crust,” International Geology Review, vol. 55, no. 15, pp. 1922–1940, 2013. View at Google Scholar