Abstract

The purpose of the multidimension uniformity test is to check whether the underlying probability distribution of a multidimensional population differs from the multidimensional uniform distribution. The multidimensional uniformity test has applications in various fields such as biology, astronomy, and computer science. Such a test, however, has received less attention in the literature compared with the univariate case. A new test statistic for checking multidimensional uniformity is proposed in this paper. Some important properties of the proposed test statistic are discussed. As a special case, the bivariate statistic test is discussed in detail in this paper. The Monte Carlo simulation is used to compare the power of the newly proposed test with the distance-to-boundary test, which is a recently published statistical test for multidimensional uniformity. It has been shown that the test proposed in this paper is more powerful than the distance-to-boundary test in some cases.

1. Introduction

Testing uniformity in the univariate case has been studied by many researchers, whereas the multidimensional uniformity test seems to have received less attention in the literature. Testing whether a pattern of points in the multidimensional space is distributed uniformly has applications in many fields such as biology, astronomy, and computer science. A commonly used goodness-of-fit test for uniformity is the chi-square test [1]. Theoretically, the chi-square test can be applied for any multivariate distribution test. However, the problem for the chi-square test is the arbitrariness of cell limits determination. Another problem for the chi-square test is that the power of the chi-square test is usually low. Some other well-known methods for univariate goodness-of-fit tests are the Kolmogorov-Smirnov test [2, 3], Anderson-Darling test [4], and the Cramer-von Mises test [5]. Justel et al. [6] proposed a multivariate goodness-of-fit test based on the idea of the Kolmogorov-Smirnov test. By using the Rosenblatt’s transformation, they reduced the multivariate case to univariate case. The test statistic they used has distribution free property and can be applied to any dimensional case. The problem for that method is that the computation of test statistic is complicated especially for over two dimensions. Liang et al. [7] proposed several statistical tests for testing uniformity in multivariate case. Those tests used the number-theoretic and quasi-Monte Carlo method for measuring the discrepancy of the points in multidimensional unit. Berrendero et al. [8] proposed a test based on the idea of distance to the boundary. It was shown by Monte Carlo simulation that the distance-to-boundary test is more powerful than the tests proposed by Liang et al. [7]. Chen and Ye [9] developed an alternative test for uniformity in univariate case. In that paper, the authors proposed a test statistic based on the order statistics in support set . The test statistic proposed in that paper is The Monte Carlo simulation results showed that the proposed test in that paper is more powerful compared with the commonly used Kolmogorov-Smirnov test when the alternative distribution is a V-shape distribution or when the sample size is small. By applying the probability integral transformation, the uniformity test can be used to check whether the underlying distribution follows any specified distribution. The idea is adopted in this paper to develop a test for the multidimensional case.

The main purpose of this paper is to propose a new test statistics for testing multidimensional uniformity. It is expected that the newly proposed test may improve the power of the multidimensional uniformity tests. Since the distance-to-boundary test is a recently published test in multivariate case, the power of test proposed in this paper will be compared with the power of the distance-to-boundary test. While the statistical test can be used for any multidimensional case, the discussion will be mainly based on the bivariate case. Some techniques used in nonparametric statistics are adopted to modify the test statistic for the purpose of raising the power of the test for the bivariate case.

2. New Test Statistic

Suppose form a random sample from a -dimensional population distribution with support set . Here is the -dimensional unit cube which is the set defined as Suppose also that are the ordered values of . The purpose is to test the following:: the population distribution is a uniform distribution on ;: the population distribution is not uniform distribution on .

The test statistic proposed in this paper is defined as

Here it is assumed that and . It can be seen that if the underlying distribution is a uniform distribution on , then Therefore, if the value of is too far away from zero, it could be an indication that the underlying distribution is not uniform distribution on . This motivates the following test procedure. Under , let be a number such that Then should be rejected at significance level if . It can be shown that is always between 0 and 1. In fact, It can be found from above that can also be rewritten as As mentioned above, this paper will mainly discuss the bivariate case. Suppose form a random sample from a bivariate population distribution with support set . Suppose also that are the ordered values of , and are the ordered values of . The purpose is to test the following:: the population distribution is a uniform distribution on ;: the population distribution is not uniform distribution on .

In order to raise the power of the test, the test statistic defined in (4) is adjusted by adopting the Kendall’s statistic. See, for example, Conover [10]. For any pair of points and in the two dimensional space, define the following items. and are said to be concordant if . and are said to be discordant if . and are said to be half concordant and half discordant if .

No comparison is made if .

Let be the total number of concordant pairs and let be the total number of discordant pairs.

Define Here it is assumed that and . It will be shown below that the inclusion of the term can raise the power of the test significantly when the two variables of the alternative bivariate distribution are correlated.

It can be seen that if the underlying distribution is a uniform distribution on , then Under , let be a number such that Then should be rejected at significance level if .

It can be shown that for any .

To show , note that Combining the above result with the fact that is always between 0 and 1, one can then conclude that .

It should be mentioned that the lower and upper bounds of the above inequality cannot be improved. It fact, one may construct bivariate data sets easily such that the values of will reach 0 and 1, respectively.

3. Critical Values and Power Comparison

Monte Carlo simulation is used to find the critical values of the test statistic described in (10). To accomplish this, 10,000,000 pseudo random samples of size are generated from the two-dimensional uniform distribution on for . The critical values of the test statistic are tabulated in Table 1. The first column is for the sample sizes and the first row is for the significance levels. The values inside the tables are the critical values corresponding to the sample size and the significance level.

The power of the test statistic proposed in this paper is compared with the recently published distance-to-boundary test by Berrendero et al. [8]. This is because the distance-to-boundary test has been shown to possess good performance in many cases. For convenience, the test statistic proposed in this paper is denoted as the test for the rest of the paper. Several alternative distributions are selected for the power comparison purpose. The selected alternative distributions can be classified into two types. The first type of alternative bivariate distributions is based on two independent univariate Beta distributions. The second type of alternative distributions is, so called, the metatype uniform distribution.

3.1. Alternative Bivariate Distributions Based on Independent Beta Distribution

The probability density function of the univariate Beta distribution is The Beta distribution family is quite flexible to get different shapes by selecting parameters and . The bivariate alternative distributions used in the power comparison are formed by two independent univariate Beta distributions.

3.1.1. Alternative Distribution 1

The bivariate Beta distribution is formed by two independent marginal distributions. Figure 1 shows the power comparison between the test and distance-to-boundary test. It can be seen that the test is more powerful than the distance-to-boundary test in this case.

3.1.2. Alternative Distribution 2

The bivariate Beta distribution is formed by two independent marginal distributions. Figure 2 shows the power comparison between the test and distance-to-boundary test. It can be seen that the test is more powerful when sample size is less than 25. When the sample size increases, the power of two tests is pretty close. The distance-to-boundary test is slightly better.

3.1.3. Alternative Distribution 3

The bivariate Beta distribution is formed by two independent marginal distributions. Figure 3 shows the power comparison between the test and distance-to-boundary test. It can be seen that the distance-to-boundary test performs better in this case. This is probably because the is a symmetric distribution. The symmetric situation is discussed in Berrendero et al. [8]. It has been shown that the power of distance-to-boundary is higher in this case. After changing the symmetric condition, the result seems different.

3.2. Metatype Uniform Distribution

The Metatype uniform distribution was mentioned in the papers of Liang et al. [7] and Berrendero et al. [8]. They introduced this distribution for the power comparison purpose.

The basic idea for creating metatype multivariate distribution is as follows. Let the random vector have a distribution function . We define as the marginal distribution function of and as the marginal distribution function of . Then define random vector as . As we know, and are uniformly distributed in the support set and the joint distribution is different from the uniform distribution since and are not independent. This kind of multivariate distribution is easily generated by any software and is useful to check the multivariate uniform distribution. Specifically, we have considered two of the metatype uniform distributions in power study.

3.2.1. Alternative Distribution 4

MNU is obtained from bivariate normal distribution with and . For the consistence of comparison, the same parameters are chosen as in Berrendero et al. [8]. The power comparison result under such a metatype uniform distribution is shown in Figure 4. The test is more powerful than distance-to-boundary test in this case. When the sample size increases, the power of the test increases and the power of distance-to-boundary test does not change too much.

3.2.2. Alternative Distribution 5

MTU is obtained from bivariate Student’s- distribution with and and 5 degrees of freedom. The power comparison result under such a metatype uniform distribution is shown in Figure 5. The test is more powerful than distance-to-boundary test in this case. When the sample size increases, the power of the test increases and the power of distance-to-boundary test does not change too much.

4. Conclusion and Discussion

In this paper, the new multidimensional uniformity test is proposed. The basic idea is from univariate uniform distribution test in the paper of Chen and Ye [9]. The method is extended to the multidimensional case and the bivariate case is discussed in detail. The new test can be used to test whether an underlying multivariate probability distribution differs from a uniform distribution. The critical value of bivariate uniformity test is calculated and the power study performed by comparing with the recently published multivariate uniformity test.

The distance-to-boundary is a recently published multivariate uniformity test by Berrendero et al. [8]. The result of the power comparison shows that the new test proposed in this paper is more powerful than the distance-to-boundary test in some cases. In particular, when the marginal distribution of alternative distribution is not symmetric and all the marginal distributions are independent and identically distributed, the new test is more powerful. The metatype uniform distribution is used in this paper for power comparison purpose. The metatype uniform distributions are generated from the normal distribution and the Student’s- distribution. The power study shows that new test is more powerful than distance-to-boundary test in this case.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors sincerely thank two anonymous referees for their comments and suggestions that greatly improved the quality of the paper.