A Novel Goodness-of-Fit Test for Cauchy Distribution

Pekgör, A.

doi:https://doi.org/10.1155/2023/9200213

Journal of Mathematics

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Supplementary Materials References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 9200213 | https://doi.org/10.1155/2023/9200213

A Novel Goodness-of-Fit Test for Cauchy Distribution

A. Pekgör¹

Academic Editor: Jiancheng Jiang

Received08 Nov 2022

Revised21 Feb 2023

Accepted23 Feb 2023

Published14 Mar 2023

Abstract

Recently, several goodness-of-fit tests for Cauchy distribution have been introduced based on Kullback–Leibler divergence and likelihood ratio. It is claimed that these tests are more powerful than the well-known goodness-of-fit tests such as Kolmogorov–Smirnov, Anderson–Darling, and Cramér–von Mises under some cases. In this study, a novel goodness-of-fit test is proposed for the Cauchy distribution and the asymptotic null distribution of the test statistic is derived. The critical values of the proposed test are also determined through a Monte Carlo simulation for different sample sizes. The power analysis shows that the proposed test is more powerful than the current tests under certain cases.

1. Introduction

The Cauchy distribution, which has many applications, is one of the most popular heavy-tailed models in statistics. For instance, the Cauchy distribution can been used to model the points of impact of a fixed straight line of particles emitting from a point source [1], the energy of an unstable state in quantum mechanics [2], the lifetime of natural broadening [3], the observations for spinning objects [4], the value-at-risk in financial risk management [5], the polar and non-polar liquids in porous glasses [6], the energy of an unstable state [7], the contact resistivity [8], the hypocenters on focal spheres of earthquakes [9], the velocity differences induced by different vortex elements [10], and the energy width of the state [11]. Since the Cauchy distribution has a wide range of applications, goodness-of-fit (GOF) tests for the Cauchy distribution have been studied by many researchers (see, for example, [12–16]). This paper aims to propose a novel goodness-of-fit test for the Cauchy distribution and compare the power performance with some already existing tests.

This paper is organized as follows. In Section 2, the proposed GOF test statistic for Cauchy distribution is introduced and the critical values are simulated for different sample sizes and significance levels. The asymptotic null distribution of the proposed test statistic is derived. An R function is provided for computing the simulated and approximate values. The power performance of the current and proposed test procedure is studied and some remarks are given in Section 3. In Section 4, two illustrative examples are given. In Section 5, some concluding remarks are provided.

2. Proposed Test Procedures

Let be a Cauchy random variable with probability density function (pdf)and cumulative distribution function (cdf)where is the location parameter and is the scale parameter. The Cauchy distribution with parameter and is called the standard Cauchy distribution. Let us consider the indexwhere , , is the parameter vector, and is the -th quantile for any distribution. It is clear that the index in equation (3) lies in the interval and the index is constant for all of and for the Cauchy distribution with cdf in equation (2). These properties are also valid for the symmetric distributions with location-scale parameters and . In other words, the index in equation (3) is not affected by the location shift and scale shift for the symmetric distributions. In this paper, we focus on developing a GOF test for the Cauchy distribution. The index in equation (3) for Cauchy distribution is denoted bywhere , is the -th quantile of Cauchy distribution with parameters and . The choice of the vector is an issue in developing a GOF test based on the index in equation (3). For instance, choosing and as in equation (6) by maximizing the distance between the indexes of the Cauchy and normal distributions when the and can be thought of maximizing the ability of the test to detect the difference between the Cauchy and normal distributions. For Cauchy distribution, we consider

In equation (5), and are prefixed, and and are obtained as 0.02217468 and 0.97782532, respectively, by maximizing the distancewith respect to and , where is the inverse cdf of normal distribution with mean and standard deviation . Since the normal distribution is a well-known distribution with a moderate tail, the absolute difference of the indexes for Cauchy and normal distribution is maximized to maximize the power compared to the normal alternative. It is noted that the optimality of can also be discussed for discriminating Cauchy and the other distributions. That is, this idea can, of course, be extended to other alternative distributions. The values of the index in equation (3) for different distributions are presented in Table 1.

In the simulation study, we consider and with , , , , , and . In the simulation study, it is observed that the value of can be chosen in practice.

It is clear from Table 1 that the index in equation (3) less than 0.069776725 indicates that the distribution has a heavier tail than the Cauchy distribution, and most of the values of the index in equation (3) are greater than 0.069776725. Form this point of view, a test statistic for GOF test of the Cauchy distribution based on sample quantiles can be proposed. Let be a random sample of size . Let and be hypotheses: comes from the Cauchy distribution. does not come from the Cauchy distribution.

In order to test against , the GOF test statistic is defined aswhere is a -th sample quantile based on sample . The proposed statistic is also the estimator of in equation (4). The test statistic in equation (7) lies between , and small and large values of indicate that the data come from a non-Cauchy distribution. The test function, based on the test statistic , can be defined aswhere and can be determined by using simulation for a fixed sample size and significance level. Although we define a statistic here, it should not be overlooked that infinitely many test statistics are proposed that cannot be counted with the change in p.

We use the R function “hdi” to get the simulated critical values and . Table 2 shows the simulated critical values with 100000 simulations for significance level 0.05 and sample sizes , and 150. The six cases , , , , , and are considered for test in this table. Since the null distribution of the test statistic is asymmetric, equal-tail critical values do not give a better power performance. Therefore, the critical values and are determined by the highest density interval (HDI) bounds. The HDI is the interval which contains the required mass such that all points within the interval have a higher probability density than points outside the interval. The R function “hdi” [17] in library “HDInterval” is used to get the critical values. The simulated critical values are given in Table 2 for . The remaining tables for and are presented in the supplementary file (see Tables 12 and 13).

In the following, we obtain the asymptotic null distribution of the test statistic . Let be the random sample from the Cauchy distribution with a pdf in equation (1). One can writewhere and are pdf and cdf of a Cauchy distribution with parameters and , , , and

Let us consider the statistic which is a tool for testing GOF to Cauchy distribution, where . Using the result in equation (9) and Cramér’s delta rule, the asymptotic null distribution of can be obtained aswherewith

It is noticed that is invariant for the values of Cauchy parameters and . Since the null distribution of the test statistic is not symmetric, the value of the test is not well defined. Rohatgi and Saleh [18] indicated that “if the distribution is not symmetric, the value is not well defined in the two-sided case, although many authors recommend doubling the one-sided value.” Although the exact null distribution of is not symmetric, the asymptotic distribution in equation (11) is symmetric. Then, we can obtain an approximate value as

The R function shown in the supplementary file (see Figure 11) can be used to compute the values in equation (14) by using simulation and asymptotic distribution in equation (11).

The approximation behavior of the asymptotic distribution is demonstrated in Figure 1. It is observed that large sample size is needed to desirable approximation. However, simulation study indicates that approximate values are close to simulated values for moderate sample cases.

(a)

(b)

(c)

(d)

The empirical type-I error rates are given for the Cauchy distribution with different parameters in the supplementary file (see Table 11). It is observed that these simulated type-I error rates are not affected by the Cauchy parameters, which indicates that the statistic can be used for GOF test of Cauchy distribution.

3. Power Analysis

Let be a random sample from a population with cdf and pdf and the corresponding order statistic be . The null and alternative hypotheses of the GOF test can be expressed aswhere is a cdf in equation (2).

In this study, we consider some current test procedures for testing against . These test statistics are described in the following. The well-known test statistics Kolmogorov–Smirnov (KS), Anderson–Darling , and Cramér–von Mises are given, respectively, bywhere and are reasonable estimates of the location and scale parameters. In simulation study and numerical example, we useto estimate the parameters and which are in closed form. Recently, some GOF tests for Cauchy distribution based on Kullback–Leibler divergence are introduced in [14, 15]. Below, these test statistics are presented, which are obtained by using different kinds of Shannon entropy estimators. These statistics are defined aswhere ’s are

Furthermore, is the window size which is given in [15], is a symmetric kernel function, and (in the simulation study, it is fixed at 1.06 s where s is the sample standard deviation) is the bandwidth:and ’s are

Gürtler and Henze [12] proposed a test based on the empirical characteristic function which is given bywhere

Zhang [13] introduced three GOF test statistics based on likelihood ratio, and these are given by

Villaseñor and González-Estrada [16] proposed a test based on the Anderson–Darling test which is given bywhere , and .

In this section, we use a simulation study to evaluate the power performance of the proposed test and compare it with those existing test procedures. The power of different test procedures is estimated with 100000 trials with significance level of 5% for different alternatives. The six cases , , , , , and are considered in the simulation study. The following distributions and the corresponding R packages are considered in the simulation study. We refer to [19–22] for the R libraries “stats,” “extraDistr,” “distr,” and “flexsurv,” respectively.(i)The distributions used in R library “stats”: normal, t, logistic, beta, gamma, uniform, exponential, and Weibull.(ii)The distributions used in R library “extraDistr”: Pareto, Laplace, Gumbel, and Lomax.(iii)The distributions used in R library “distr”: arcsine.(iv)The distributions used in R library “flexsurv”: loglogistic.

In the power study, in addition to the well-known distributions, data are generated from other distributions for the alternative hypothesis. Suppose , and are independent random variables with distributions t (2), normal (0, 1), uniform (−1, 1), and Cauchy (0, 1), respectively. Let us consider the following transformations having possible heavily tailed distributions (HTD): , , , , , , , and . The distributions of these transformations are denoted by , respectively.

The simulation results are presented in Figures 2 and 3. The other power analysis results are available in the supplementary file (see Tables 1–10 and Figures 1–10). From these tables and figures, we observe that if the underlying distributions are symmetric, the proposed test procedure has a good performance among the tests considered here for almost all the sample sizes. The test based on gives desired power values for almost all the sample sizes when the data come from distributions with lighter tails (say ). For large sample cases, the test has a similar power performance compared to the other test procedures. The test does not work well when the data come from a distribution with heavier tails than the Cauchy distribution for small and moderate sample cases. However, and have a better power than , , , and .

For the detailed comparison, the following findings may be illustrative. For small sample cases, tests are competitive with tests. Also, tests are better than the tests based on empirical cumulative distribution function ecdf and Z tests. For moderate and large sample cases, tests are better than tests. It should be noted that the Z tests are better compared to other tests. tests have better performance for the large sample and large index cases. When the index value increases, it is observed that the test outstrips the other tests. The power of the tests has bad power when the distribution has a heavier tail than the Cauchy distribution (that is, . However, test has the desired power when the sample size increases. The index for the alternative distribution becomes smaller, and and should be chosen large and small, respectively.

Figure 4 shows the performance plots of the test statistics versus the index values of the alternative distributions. Figure 4 gives detailed information to compare the tests and to observe the behavior of the test for the selection of . The power of test is low when is around . It can be seen that the power of the test statistic increases as becomes larger than for all . When is less than and the sample size is small, the powers of the and test statistics are higher than those of the other test statistics. When is greater than , it can be seen that the powers of the and test statistics outperform the other tests.

(a)

(b)

(c)

(d)

4. Illustrative Examples

In this section, for illustrative purposes, 30 returns of closing prices of the German Stock Index (DAX) are considered. The data, presented in Table 3, are also used in [14, 15]. The GOF test statistics and the corresponding values are presented in Table 4. The proposed statistic and related value have been calculated using the function available in the supplementary file (see Figure 11). Considering Figure 5 and Table 4, the null hypothesis that the data follow the Cauchy distribution is not rejected at significance level of 0.05.

In order to observe the ability of the GOF tests, we generate data from t distribution with 3 degrees of freedom which is close to the Cauchy distribution. The simulated data are presented in Table 5. Q-Q plot in Figure 6 is also provided to show that the data do not fit the Cauchy distribution. The results are given in Table 6. It is clear from Table 6 that , , , , , , , , , , , , and reject the null hypothesis at significance level of 0.05.

5. Conclusion

In this article, a novel GOF test procedure based on quantiles is proposed for Cauchy distribution. The simulation study reveals that the proposed GOF test procedure has a good power performance and it can be used for testing, whether the data come from the Cauchy distribution or not. An R function is provided to compute the test statistic and the approximate value. As a future study, the statistic can be extended for GOF testing for any continuous distributions without a shape parameter. The introduced test can also be extended under different types of censoring schemes such as progressive or progressive first-failure censoring.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Supplementary Materials

All tables and power graphs derived from simulations. The R program codes of the AP. test function that calculates the proposed test statistic and value. (Supplementary Materials)

References

N. L. Johnson, S. Kotz, and N. Balakrishnan, “Continuous univariate distributions,” John Wiley and Sons, vol. 289, pp. 93–112, 1995.
View at: Google Scholar
A. Alzaatreh, C. Lee, F. Famoye, and I. Ghosh, “The generalized Cauchy family of distributions with applications,” Journal of Statistical Distributions and Applications, vol. 3, no. 1, pp. 12–16, 2016.
View at: Publisher Site | Google Scholar
E. Hecht, Optics, Addison-Wesley, 1987.
S. F. Gull, “Bayesian inductive inference and maximum entropy,” in Maximum-entropy and Bayesian Methods in Science and Engineering, pp. 53–74, Springer, Dordrecht, Netherlands, 1988.
View at: Google Scholar
T. Liu, P. Zhang, W. S. Dai, and M. Xie, “An intermediate distribution between Gaussian and Cauchy distributions,” Physica A: Statistical Mechanics and Its Applications, vol. 391, no. 22, pp. 5411–5421, 2012.
View at: Publisher Site | Google Scholar
S. Stapf, R. Kimmich, R. O. Seitter, A. I. Maklakov, and V. D. Skirda, “Proton and deuteron field-cycling NMR relaxometry of liquids confined in porous glasses,” Colloids and Surfaces A: Physicochemical and Engineering Aspects, vol. 115, pp. 107–114, 1996.
View at: Publisher Site | Google Scholar
M. S. Grewal and A. P. Andrews, Kalman Filtering: Theory And Practice With Matlab, John Wiley and Sons, Hoboken, NJ, USA, 2014.
S. S. Winterton, T. J. Smy, and N. G. Tarr, “On the source of scatter in contact resistance data,” Journal of Electronic Materials, vol. 21, no. 9, pp. 917–921, 1992.
View at: Publisher Site | Google Scholar
Y. Y. Kagan, “Correlations of earthquake focal mechanisms,” Geophysical Journal International, vol. 110, no. 2, pp. 305–320, 1992.
View at: Publisher Site | Google Scholar
I. A. Min, I. Mezić, and A. Leonard, “Levy stable distributions for velocity and velocity difference in systems of vortex elements,” Physics of Fluids, vol. 8, no. 5, pp. 1169–1180, 1996.
View at: Publisher Site | Google Scholar
B. P. Roe, “Basic probability concepts,” in Probability and Statistics in Experimental Physics, pp. 1–4, Springer, Heidelberg, Germany, 1992.
View at: Google Scholar
N. Gürtler and N. Henze, “Goodness-of-fit tests for the Cauchy distribution based on the empirical characteristic function,” Annals of the Institute of Statistical Mathematics, vol. 52, no. 2, pp. 267–286, 2000.
View at: Publisher Site | Google Scholar
J. Zhang, “Powerful goodness-of-fit tests based on the likelihood ratio,” Journal of the Royal Statistical Society: Series B, vol. 64, no. 2, pp. 281–294, 2002.
View at: Publisher Site | Google Scholar
M. Mahdizadeh and E. Zamanzade, “New goodness of fit tests for the Cauchy distribution,” Journal of Applied Statistics, vol. 44, no. 6, pp. 1106–1121, 2017.
View at: Publisher Site | Google Scholar
M. Mahdizadeh and E. Zamanzade, “Goodness-of-fit testing for the Cauchy distribution with application to financial modeling,” Journal of King Saud University Science, vol. 31, no. 4, pp. 1167–1174, 2019.
View at: Publisher Site | Google Scholar
J. A. Villaseñor and E. González-Estrada, “Goodness-of-Fit tests for Cauchy distributions using data transformations,” in Advances in Statistics - Theory and Applications. Emerging Topics in Statistics and Biostatistics, I. Ghosh, N. Balakrishnan, and H. K. T. Ng, Eds., Springer, Heidelberg, Germany, 2021.
View at: Google Scholar
M. Meredith and J. Kruschke, “HDInterval: highest (posterior) density intervals,” 2020, https://CRAN.R-project.org/package=HDInterval.
View at: Google Scholar
V. K. Rohatgi, A. K. M. E. Saleh, and M. Ehsanes, Nonparametric statistical inference. an introduction to probability and statistics, John Wiley and Sons, 2001.
R. Core Team, R: A Language and Environment for Statistical Computing, Vienna, Austria, 2019, https://www.R-project.org/.
T. Wolodzko, “extraDistr: Additional Univariate and Multivariate Distributions,” 2020, https://CRAN.R-project.org/package=extraDistr.
View at: Google Scholar
F. Camphausen, M. Kohl, P. Ruckdeschel, and T. Stabla, “distr: object oriented implementation of distributions,” 2019, https://CRAN.R-project.org/package=distr.
View at: Google Scholar
C. Jackson, P. Metcalfe, J. Amdahl, M. T. Warkentin, and K. Kunzmann, “Flexsurv: Flexible Parametric Survival and Multi-State Models,” 2021, https://CRAN.R-project.org/package=flexsurv.
View at: Google Scholar

Copyright

Copyright © 2023 A. Pekgör. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

555

Downloads

341

Citations