Not Significant: What Now?

Marinell, Gerhard; Steckel-Berger, Gabriele; Ulmer, Hanno

doi:https://doi.org/10.1155/2012/804691

Journal of Probability and Statistics

On this page

Abstract Introduction Discussion Appendix References Copyright Related Articles

Research Article | Open Access

Volume 2012 | Article ID 804691 | https://doi.org/10.1155/2012/804691

Not Significant: What Now?

Gerhard Marinell,¹Gabriele Steckel-Berger,¹and Hanno Ulmer²

Academic Editor: Man Lai Tang

Received25 Jul 2012

Accepted12 Oct 2012

Published08 Nov 2012

Abstract

In a classic significance test, based on a random sample with size , a value will be calculated at size aiming to reject the null hypothesis. The sample size , however, can retrospectively be divided into partial samples and a test of significance can be calculated for each partial sample. As a result, several partial samples will provide significant values whereas others will not show significant values. In this paper, we propose a significance test that takes into account the additional information from the values of the partial samples of a random sample. We show that the values can greatly modify the results of a classic significance test.

1. Introduction

In this day and age testing for significance has become a ritual which, if it leads to a significant result, still opens the doors to many well-known journals in nearly every scientific field. This is the case even though for a long time the application of null hypothesis significance testing has been criticized and even rejected [1]. What will be shown here is that by extending the classic significance test additional information from a random sample can be obtained and a “not significant” result can possibly be made “significant”. A misuse of null hypothesis significance testing can however not be prevented with this method [2].

In the significance test as defined by Fisher [3, 4] the probability that a specific sample will occur is calculated based on the validity of the null hypothesis. This probability is usually abbreviated with and is compared with a conventionally determined level of significance which is normally 5%. When is equal to or less than this level of significance then the null hypothesis is rejected. If this is not the case then, as defined by Fisher, the null hypothesis cannot be rejected but also not accepted [5]. This procedure is valid for a given sample provided it is a random sample. This means that the units of the sample are drawn from the population randomly and the probability with which a unit is drawn out of the population is given. If you presuppose, as is customary, a simple random sample (“idd” assumption = “independent and identically distributed” assumption), then we have the same probability for each unit being part of the sample and the drawings from the population occur independently of each other.

Even though the randomness of the sample is a premise for a test of significance, it is seldom certified. Additionally, the ‘‘iid’’ assumption requires that the order of drawings from the population is known and this allows the split of a random sample into a series of smaller subsamples. The sample size can thus be retrospectively divided into partial samples and a test of significance can be calculated for each partial sample. As a result, several partial samples will provide significant values whereas others will not show significant values and the third category will lead to nearly only significant values.

2. Illustrative Examples

A series of examples with randomly drawn samples should illustrate the typical situations. In a first example, the null hypothesis should be verified at a significance level of 5% with the help of a significance test and a random sample size of . If a random sample of this size is created by a random generator for the binomially distributed random variable and the “true” value , then consequently the corresponding random sample with a value of 0.100 does not suggest a significant result. However, if the partial samples 1 to of this random sample are examined then we get a different result. The values of this partial sample are depicted in the Figure 1 below. One can see that the values lie both above and under the level of significance. Yet in the case of , is clearly above the 5% level which is similar to what can be seen at .

In this specific case we know however that the true value is 0.55, hence the null hypothesis does not apply, but the rejection of the null hypothesis based on the given sample is not possible. This would nevertheless be possible if one would, for example, only take the first 28 units into consideration. As a result the value (0.044) is smaller than the significance level of 5% and the null hypothesis can be rejected. Such a rejection of the null hypothesis, however, requires that the information of the two following units is ignored. A method that simply ignores valid information is not sensible and can hardly be accepted. So in our approach we do not intend to ignore valid information, as we show below. We try to capture all the information that comes with a valid drawn sample. The null hypothesis in our example can only be rejected if all values from the subsamples are considered.

In a second example random values for the “true” value , and thus the same value as the null hypothesis, were created with the help of a random generator. The corresponding sample with a size of has a value of 0.100 and thus does not indicate a significant result on the 5% level of significance. The null hypothesis cannot be rejected. The values of the possible partial samples are shown in Figure 2.

What is striking here is that the value for the partial sample is and therefore lies under the 5% level of significance and consequently leads to a rejection of the null hypothesis even though the “true” value upon which the random sample was created is and thus the same as the null hypothesis. The same is true for the sample sizes , 24, and 26. Their values also lead to a rejection of the null hypothesis. Once again the temptation is great in these cases to report a significant result by choosing sample sizes of 16, 23, 24, or 26 in which we have significant values. This would once again mean that valid information is dropped; this is inacceptable. If, in contrast, the values from all subsamples are considered, it appears likely that the null hypothesis cannot be rejected, as discussed below.

A final example should clarify the situation further: the “true” value that the binomially distributed random value creates is . The value of the corresponding random sample with a size of is . The null hypothesis can therefore not be rejected at a significance level of 5%. A graph of the resulting partial samples’ values can be seen in Figure 3.

From a sample size of onwards nearly all of the values until are smaller than the level of significance. Only when is the value greater than 0.05. In this case, in contrast to the previous examples, it would not be possible to reject the null hypothesis at a sample size of as we have a value of 0.062 which is greater than the level of significance. The same is true for . In this example again, using all values will lead to the correct result, namely, the rejection of the null hypothesis.

Consequently, we can draw the following conclusions. If as in usual practice only a value at is calculated, the null hypothesis cannot be rejected in the three examples even though this decision is wrong in two of the three situations. A decision based solely on a random sample’s significance test, with the full sample size of , does not take all the available information into consideration. What is missing for a well-rounded picture of any random sample is given in the partial sample’s values as demonstrated in the examples and graphs above. The question that remains unanswered is how, based on the additional information, can we numerically make a distinction between a significant rejection or a nonrejection of the null hypothesis?

3. Bootstrap

One method to include the additional information given in the partial sample’s values is the bootstrap method [6]. This method does not require a particular type of value distribution and still enables an estimation of the value’s unknown distribution function including mean and variance. Consequently, the confidence interval for the unknown “true” value can also be determined, which contains not only information about the sample , but also about its partial samples.

In the first example given here the “true” value was , which created a random sample with a size of . In this sample the null hypothesis cannot be rejected at a significance level of 5%. The value for this sample size equals 0.100. The following results are obtained if one does not only take the information from the sample size into consideration, but also from the partial samples calculated with the bootstrap method.

The null hypothesis can be rejected as the sample was taken from a population with . The probability of this sample result if the null hypothesis is valid is not 0.100 anymore but equals 0.044. The mean and standard deviation of the bootstrap distribution of values are , .

In the second example the “true” value was , which created a random sample with a size of , the same as the null hypothesis. In this case the null hypothesis cannot be rejected at a significance level of 5%. The value for this sample size equals 0.100. The following results are obtained if one does not only take the information from the sample size into consideration but also from the partial samples calculated with the bootstrap method.

The null hypothesis cannot be rejected as the sample was taken from a population with . The probability of this sample result if the null hypothesis is valid has risen to 0.1479. The mean and standard deviation of the bootstrap distribution of values are , .

In the last example the “true” value was . With a sample size of , the null hypothesis cannot be rejected at a significance level of 5%. The value for this sample size equals 0.100. If one, however, does not only take the information from the sample size into consideration, but also from the other sample sizes with the bootstrap method, then it is possible to reject the null hypothesis.

The probability of this sample result if the null hypothesis is valid equals 0.0448. The mean and standard deviation of the bootstrap distribution of values are , .

In contrast to our examples, one usually does not know the “true” value of which created, with the help of a random generator, each of our binomially distributed random samples. Our aim was to estimate the “true” value of in our samples. It does not play a roll whether we are dealing with a test problem for a or whether we are looking at the famous null hypotheses of parameters such as for example, or or .

4. Discussion

Consequently, should the classic significance test lead to a result that is “not significant”, then this does not necessarily mean our analysis has come to an end (it does however also not necessarily indicate a “significant” result). The values can greatly modify the results of the classic significance test as our examples have shown. This kind of significance test offers the opportunity to not only take the information provided in a classic significance test based on a random sample with the size into consideration, but also gives us additional information from the values of the partial samples of a random sample. These values can be considered as the realizations of the random variables [7], who also consider the values random variables but are pursuing a different aim. According to Bernoulli’s law of large numbers, values will converge to the true value with increasing : The bootstrap method is a possibility to estimate the mean and variance of this random variable.

Our idea to extend the classical significance testing does not describe a sequential analysis in the sense of Wald’s sequential probability ratio test [8], where, for example, the sampling can be stopped after reaching a certain decision limit. In our approach, the sample size is fixed and there is no intention to stop sampling at an earlier stage. We further do not need to consider the type II error and the corresponding effect size since our approach relies on the Fisher model [3, 4] rather than that of Neyman and Pearson [9]. Based on the required randomness of a random sample, the sampling order is also fixed and we do not need to account for the theoretical possibility (sum of binomial coefficients at ) to analyze more than partial subsamples. In addition, our approach does not constitute a typical multiple testing situation. A multiple testing situation usually requires the testing of multiple hypotheses. In our case, we test only one hypothesis, our main aim is to improve the estimation of the value using the information of the data in the partial subsamples. In this sense our approach can also be regarded as an estimating problem.

In summary, in this paper, we propose a significance test that takes into account information from values of partial subsamples of a random sample. We show that the use of the additional values can greatly modify the results of a classic significance test as an “extended partial data” analysis approach to data mining.

Appendix

Realisation of Three Random Samples

(1)Sample: .(2)Sample: .(3)Sample: .

References

J. Cohen, “The earth is round ( $p < .05$ ),” American Psychologist, vol. 49, no. 12, pp. 997–1003, 1994.
View at: Google Scholar
G. Loftus, “On the tyranny of hypothesis testing in the social sciences,” Contemporary Psychology, vol. 36, pp. 102–105, 1991.
View at: Google Scholar
R. A. Fisher, The Design of Experiments, (5th Ed., 1951; 7th Ed., 1960; 8th Ed., 1966), Oliver & Boyd, Edinburgh, UK, 1935.
R. A. Fisher, Statistical Methods and Scientific Inference, Oliver & Boyd, Edinburgh, UK, 1956.
G. Gigerenzer, “Mindless statistics,” Journal of Socio-Economics, vol. 33, no. 5, pp. 587–606, 2004.
View at: Publisher Site | Google Scholar
B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC, 1994.
D. Boos and I. Stefanski, “P-value precision and reproducibility,” The American Statistician, vol. 65, no. 4, pp. 213–221, 2011.
View at: Google Scholar
A. Wald, “Sequential tests of statistical hypotheses,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945.
View at: Google Scholar
J. Neyman and E. Pearson, “On the problem of the most efficient tests of statistical hypotheses,” Philosophical Transactions of the Royal Society of London. Series A, vol. 231, pp. 289–337, 1933.
View at: Google Scholar

Copyright

Copyright © 2012 Gerhard Marinell et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1222

Downloads

1064

Citations