Research Article | Open Access
Masoumeh Akbari, "Characterization and Goodness-of-Fit Test of Pareto and Some Related Distributions Based on Near-Order Statistics", Journal of Probability and Statistics, vol. 2020, Article ID 4262574, 9 pages, 2020. https://doi.org/10.1155/2020/4262574
Characterization and Goodness-of-Fit Test of Pareto and Some Related Distributions Based on Near-Order Statistics
In this paper, a new definition of the number of observations near the th order statistics is developed. Then some characterization results for Pareto and some related distributions are established in terms of mass probability function, first moment of these new counting random variables, and using completeness properties of the sequence of functions . Finally, new goodness-of-fit tests based on these new characterizations for Pareto distribution are presented. And the power values of the proposed tests are compared with the power values of well-known tests such as Kolmogorov–Smirnov and Cramer-von Mises tests by Monte Carlo simulations.
Let order statistics be the nondecreasing order of independent and identically distributed (iid) random variables with cumulative distribution function (cdf) . The order statistics have important roles in different areas of statistics and probability. In reality, some kinds of order statistics are more applicable. For example, in actuarial science, the distribution of the minimum of the two lifespans of the couple is important for insurance policy to make decisions. In industries, specifically in reliability and survival analysis, order statistics are used to solve problems. Meteorology, hydrology, and so on are other fields of applications of order statistics. Interested readers can study the detail of theory and application of order statistics, for instance, in Arnold et al. . Let be discrete cdf. For the first time, Eisenberg et al.  defined the quantity as the number of winners in a golf competition. They studied sufficient conditions under which converges to 1. After that Pakes and Steutel  considered similar notions for continuous cdf as follows:where is a constant. In fact, counts the number of observations in the left-hand neighbourhood of the sample maximum with fixed distance “.” Later, was developed to the number of observations near the th-order statistics as follows:where the support of is . Similarly, the number of observations in the right-hand neighbourhood of the th order statistics was defined as
The support of is . The probability mass function (pmf) of is as follows:where and is the cdf of . Also, the pmf of is given bywhere . For more details, one can refer to Dembińska et al. .
More results of the description of their distributions, asymptotic properties, and their generalization have been investigatede.g., Pakes and Li , Li , Pakes ([7,8]), Balakrishnan and Stepanov ([9,10]), Dembińska et al. , and Dembińska ([11–13]). So far, few researchers have addressed the issue of statistical inference based on near-order statistics, e.g., Müller , Hashorva and Hüsler , Akbari et al. , and Akbari and Akbari . In the present paper, a new version of near-order statistics is first defined. Then some characterization results as a statistical tool in goodness-of-fit (GOF) test for some continuous distributions are obtained. The results of this paper are organized as follows. Section 2 contains preliminary results. The characterization results of the paper are included in Section 3. Finally, in Section 4 are introduced two tests for goodness-of-fit tests for Pareto distribution. The critical values of the proposed test statistics are computed by Monte Carlo simulations. Also, their power is compared with those computed by well-known tests such as Kolmogorov–Smirnov and Cramer–von Mises tests by simulations. All simulations are carried out by using R 3.6.3 and with 10000 replications.
2. Preliminary Results
Let be iid random variables from continuous cdf with support , and be one of the distribution functions Pareto, uniform, or power function. Constructing characterizations for such by the pmfs or moments of some functions of random variables is a common way. But it is not possible by counting random variables as the number of observations on the location-type neighbourhood of the certain order statistics defined in equations (4) and (5), because their pmfs do not have a closed-form expression. Therefore, new types of number of observations falling within the left-hand and right-hand of neighbourhood of the specific order statistics, as an extension to scale-type neighbourhood, are introduced, respectively, as follows:where andwhere .
Proposition 1. Using the same arguments given in Dembińska et al. , it is concluded that the pmfs of new counting random variables and are the same as the pmfs of and , respectively, with and , that is,
On the other hand, by simple algebra calculations, the first moment of and can be derived, respectively, as follows:
Here, some results as examples of special cases of Pareto and power function distributions are reported that will be useful for obtaining further results in the next section.
Example 1. Let be iid random variables from Pareto (, ). So their survival distribution functions is given byThen from equation (9), the pmf of related to this sequence is as follows:Equation (13) shows that has binomial pmf with parameters and . Thus,which does not depend on scale paremeter .
Example 2. It is well-known that if ∼ Pareto (, ), then random variable has power function distribution with following cdf:The notation power (, ) is used for power function distribution with parameters and . It is also called generalized uniform distribution because it is standard uniform cdf at and . From (8), the pmf of for random variables that are distributed as power (, ) is concluded asAccording to (16), has binomial pmf with parameters and success probability . SoAs we know, the uniform distribution function on interval is a special case of power (1,1) with following cdf:Therefore from (16), the pmf of when , equalsand its expectation is
3. Characterizaion Results
In this section, some characterization results based on distributional properties of near-order statistics and for some continuous distributions are established in terms of property of sequence of complete functions. Thus, in the sequel, some notions and theorems related to this theory are reminded.
Definition 1. A sequence of elements of a Hilbert space is called complete if the only element which is orthogonal to every is the null element, that isimplies null.
The notation denotes the inner product of . In the present paper, the Hilbert space with the following inner product being considered:where and are real-valued square integrable functions on . One of the sequences of complete functions in is which is used in this paper.
The following theorem is known as Müntz theorem that states the necessary and sufficient condition for completeness of the subsequence .
Theorem 1 (Higgins , page 95). Sequence in is complete if and only if
For more details about Hilbert space and complete sequences, refer to Higgins . Pareto is one of the distributions that have many applications in economics and actuturial sciences. So far, a lot of properties and characterization of it based on order statistics or their functions have been obtained, for example, Lee and Chang , Afify , Ahsanullah and Shakil , Ahsanullah et al. , and Nofal and El Gebaly . In the following theorem, some characterizations for Pareto law in terms of are established.
Theorem 2. Let be iid random variables with continuous cdf whose support is . is Pareto () cdf, if and only if one of the following conditions holds:(a)There exists such that for all and for , we have(b)For all , and a fixed , we have
Proof. If sequence has Pareto () cdf, by the use of equations (13) and (14), one can easily obtain parts (a) and (b). Let part (a) hold. Using pmf of and the assumptions of (a), the equality in (a) can be rewritten asThe right-hand side of equation (26) can be expressed asOn the other hand, replacing with , the equality in (26) can be stated asTaking the change of variable in the left-hand side of equation (28), it is deduced:By the assumption “” and after some algebra simplifications in the aforementioned equality, it is concluded thatSince and , by Minkowski’s inequality, we haveIf (30) holds for all , by the completeness property of the sequence , the following identity can be derivedHence,By taking in (33), it can be rewritten asIf (34) holds for all and , by the use of the method of solution given in Aczél , it is concluded that function is the genaral solution of (34). Because and is a survival distribution function, the constant will be . So, the proof is completed.
Suppose that part (b) holds. Then from equation (11), it is deduced thatSince , the last equality, after some simplifications, can be expressed asThe rest of the proof is similar to the proof of part (a).
So far, some results of characterization of power function distribution have been obtained. For example, it was characterized by Ahsanullah et al.  through lower records. Also Khan and Khan  and Lim and Lee  characterized it based on dependency property of lower records. Tavangar  presented a characterization of it using dual generalized order statistics. Now, in the next theorem new characterization results of power function distribution are proved.
Theorem 3. Suppose that are iid continuous random variables from cdf with support . Then ’s have power function distribution with cdf (15) if and only if one of the following statements holds.(a)For all and , there exists such that for and some , we have(b)For a fixed and for all and some , we have
Proof. By supposing ’s have power function distribution with cdf (15), from equations (7) and (16), parts (a) and (b) can be easily concluded. Let condition (a) be satisfied, thenSince and , the quantity belongs in by Minkowski’s inequality. The completeness property of the sequence and equation (39) result inThis is equivalent toTaking the change of variable in (41) givesThe function is the general solution of above functional equation. This completes the proof of (a). In a similar way, if (b) holds, one can easily prove that the parent population is power function distribution.
The results of Theorem 3 can also be observed directly from Theorem 2 by noticing that ∼ power (, ) if and only if ∼ Pareto (, ). Therefore, for where with superscript presents the number of observations near the th-order statistics related to -sequence. According to relationship between distributions of power function and standard uniform that is mentioned before, from Theorem 3, the following results without proof are stated.
Corollary 1. Let be iid continuous random variables from cdf that is supported on . Then, is standard uniform distribution if and only if one of the following statements holds:(a)For all and , there exists such that for , we have(b)For a fixed and for all and , we have
Remark 1. According to Theorem 1, it is not necessary that the assumptions of Theorem 2 hold for all “” or “.” This fact is also true for Theorem 3 and Corollary 1. So, the results of Theorems 2 and 3 and Corollary 1 hold if their assumptions provide for any increasing subsequence or such that the equality (23) holds.
Remark 2. Some characterization results of two-parameter exponential distribution have been obtained based on counting random variable by Akbari and Akbari . Their results of characterizations in Section 2 can also be derived directly by Theorems 2 and 3. For considering this claim, suppose be a random variable having two-parameter exponential distribution with parameters (, ), denoted by Exp (), and the following cdf:Since ∼Exp () if and only if ∼Pareto (), the following relationship holds between and . For ,Also, ∼Exp () if and only if ∼power (). So for ,
4. Goodness-of-Fit Test Results
So far, many results of GOF tests for different distributions using characterization results have been obtained. For example, Rizzo , Obradovi et al. , and Volkova  obtained GOF tests for Pareto distribution in terms of its different characteristic properties. According to Nikitin , tests based on characterization results are usually more efficient than other tests, because the unique feature of the same distribution has been used in constructing test statistics.
Let be iid random variables from continuous distribution function . For testing null hypothesis against for some , are presented two test statistics based on characterization results of Theorem 2. According to part (a) of this theorem, the null hypothesis will be rejected if there exists “” such that for all and , equation (24) is not satisfied, i.e., the value of quantityto be large. From (9) and assumption , the above expression is equivalent to
Replacing by , the emprical distribution function, a point estimator of (50), can be considered as
Therefore, the test statistic that its large value will be rejected, , is given by
With the same discussion, the another test statistic based on the part (b) of Theorem 2 can be aswhere .
It is obvious that and are free of scale parameter of Pareto distribution and their large values reject .
In the rest of this section, the power values of two test statistics and will be compared with well-known tests, namely, Kolmogorov–Smirnov and Cramer–von Mises tests which their statistics are, respectively,where and , and
Since Pareto distribution is long tail, following alternative distributions that are long tail on the right-hand side are considered for comparison of power values of statistics , and .(i)The Weibull distribution with density , denoted by (ii)The gamma distribution with density , denoted by (iii)The lognormal distribution with density , denoted by (iv)The uniform distribution with density , denoted by (v)The log-gamma distribution with density , denoted by
Since it is not easy to find the null distribution of , and , Monte Carlo simulations with 10000 replications are used for calculating their power values and critical values at 5 percent significance level. Tables 1 and 2 show the results for null distribution Pareto (1,2) and Pareto (2,2), respectively. Because the statistics and have the parameter , one can choose an optimal to maximize corresponging power values. So, these values are calculated and shown in the tables. Unfortunately there is no accurate method to find these values and depend on the support of null and alternative distributions.
The values in parentheses in the tables refer to estimated significance level. According to the results of the two tables, it is concluded that proposed tests are always more powerful than the other tests. Even in small sample size, the proposed tests perform very well and better than the others.
In the following example, it is used real data set to illustrate how the proposed tests can be applied.
Example 3 (As an application to real data). The following data represent the time for break down of a type of electrical insulating material subject to a constant-voltage stress (Nelson ).This data recently were used by Tiku and Akkaya . They established that the null hypothesis where data come from exponential distribution cannot be rejected at 10 percent significance level. It is obvious that data come from a distribution with long tail on the right-hand side. So, Pareto distribution can be another suggested distribution for such data. For testing versus , first the parameters of Pareto distribution are estimated with shape parameter and scale parameter . Then, based on data, the values of the proposed statistics (with ), Kolmogorov–Smirnov, and Cramer–von Mises statistics have been obtained as follows:Also, with 10000 replications based on estimated Pareto distribution, the critical values at 10 percent significance level have been calculated, respectively,Hence, the null hypothesis that data come from Pareto distribution cannot be rejected using this data.
No data are included in the study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
- B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics, SIAM, Philadelphia, PA, USA, 2008.
- B. Eisenberg, G. Stengle, and G. Strang, “The asymptotic probability of a tie for first place,” The Annals of Applied Probability, vol. 3, no. 3, pp. 731–745, 1993.
- A. G. Pakes and F. W. Steutel, “On the number of records near the maximum,” Australian Journal of Statistics, vol. 39, no. 2, pp. 179–192, 1997.
- A. Dembińska, A. Stepanov, and J. Wesolowski, “How many observations fall in a neighbourhood of an order statistic?” Communications in Statistics Theory and Methods, vol. 36, pp. 1–17, 2007.
- A. G. Pakes and Y. Li, “Limit laws for the number of near maxima via the Poisson approximation,” Statistics & Probability Letters, vol. 40, no. 4, pp. 395–401, 1998.
- Y. Li, “A note on the number of records near the maximum,” Statistics & Probability Letters, vol. 43, no. 2, pp. 153–158, 1999.
- A. G. Pakes, “The number and sum of near-maxima for thin-tailed populations,” Advances in Applied Probability, vol. 32, no. 4, pp. 1100–1116, 2000.
- A. G. Pakes, “Numbers of observations near order statistics,” Australian & New Zealand Journal of Statistics, vol. 51, no. 4, pp. 375–395, 2009.
- N. Balakrishnan and A. Stepanov, “A note on the number of observations near an order statistic,” Journal of Statistical Planning and Inference, vol. 134, no. 1, pp. 1–14, 2005.
- N. Balakrishnan and A. Stepanov, “Asymptotic properties of numbers of near minimum observations under progressive type-II censoring,” Journal of Statistical Planning and Inference, vol. 138, no. 4, pp. 1010–1020, 2008.
- A. Dembińska, “Asymptotic properties of numbers of observations in random regions determined by central order statistics,” Journal of Statistical Planning and Inference, vol. 142, pp. 516–528, 2012.
- A. Dembińska, “Limit theorems for proportions of observations falling into random regions determined by order statistics,” Australian & New Zealand Journal of Statistics, vol. 54, pp. 199–210, 2012.
- A. Dembińska, “Asymptotic normality of numbers of observations in random regions determined by order statistics,” Statistics, vol. 48, pp. 508–523, 2014.
- S. Müller, “Tail estimation based on numbers of near m-extremes,” Methodology and Computing in Applied Probability, vol. 5, pp. 197–210, 2003.
- E. Hashorva and J. Hüsler, “Estimation of tails and related quantities using the number of near-extremes,” Communications in Statistics: Theory and Methods, vol. 34, no. 2, pp. 337–349, 2005.
- M. Akbari, M. Fashandi, and J. Ahmadi, “Characterizations based on the numbers of near-order statistics,” Statistical Papers, vol. 57, no. 1, pp. 21–30, 2016.
- M. Akbari and M. Akbari, “Some applications of near-order statistics in two-parameter exponential distribution,” Journal of Statistical Theory and Applications, vol. 19, no. 1, pp. 21–27, 2020.
- J. R. Higgins, Completeness and Basis Properties of Sets of Special Functions, Cambridge University Press, New York, NY, USA, 2004.
- M. Y. Lee and S. K. Chang, “Characterizations based on the independence of the exponential and Pareto distributions by record values,” Journal of Applied Mathematics and Computing, vol. 18, pp. 497–503, 2005.
- E. E. Afify, “Characterization of mixtures from exponential and Pareto distributions using left truncated moments,” IJSS, vol. 5, pp. 1–18, 2006.
- M. Ahsanullah and M. Shakil, “A note on the characterizations of Pareto distribution by upper record values,” Communications of the Korean Mathematical Society, vol. 27, no. 4, pp. 835–842, 2012.
- M. Ahsanullah, M. Shakil, and B. M. G. Kibria, “Characterizations of continuous distributions by truncated moment,” Journal of Modern Applied Statistical Methods, vol. 15, no. 1, pp. 316–331, 2016.
- Z. M. Nofal and Y. M. El Gebaly, “New characterizations of the Pareto distribution,” Pakistan Journal of Statistics and Operation Research, vol. 13, no. 1, pp. 63–74, 2017.
- J. Aczél, Lectures on Functional Equations and Their Applications, Academic Press, New York, NY, USA, 1966.
- M. Ahsanullah, M. Shakil, and B. M. Golam Kibria, “Characterization of power function distribution based on lower records,” Probstat Forum, vol. 6, pp. 68–72, 2013.
- M. I. Khan and M. A. R. Khan, “Characterization of generalized uniform distribution based on lower record values,” Probstat Forum, vol. 10, pp. 23–26, 2017.
- E. H. Lim and M. Y. Lee, “A characterization of the power function distribution by independent property of lower record values,” Journal of the Chungcheong Mathematical Society, vol. 26, pp. 269–273, 2013.
- M. Tavangar, “Power function distribution characterized by dual generalized order statistics,” JIRSS, vol. 10, pp. 13–27, 2011.
- M. L. Rizzo, “New goodness-of-fit tests for Pareto distributions,” Astin Bulletin, vol. 39, no. 2, pp. 691–715, 2009.
- M. Obradovi, M. Jovanovic, and B. Milosevic, “Goodness-of-fit tests for Pareto distribution based on a characterization and its asymptotic,” Statistics, vol. 49, pp. 1026–1041, 2015.
- K. Y. Volkova, “On asymptotic efficiency of goodness-of-fit tests for the Pareto distribution based on its characterization,” Filomat, vol. 29, pp. 2311–2324, 2015.
- Y. Nikitin, “Test based on characterizations, and their efficiencies: a survey,” Acta et Commentationes Universitatis Tartuensis de Mathematica, vol. 21, pp. 3–24, 2017.
- W. B. Nelson, “Statistical methods for accelerated life test data—the inverse power law model,” Tech. Rep., General Electric, Schenectady, NY, USA, 1970, Technical report 71-C011.
- M. L. Tiku and A. D. Akkaya, Robust Estimation and Hypothesis Testing, New Age International Pvt. Ltd., New Dehli, India, 2004.
Copyright © 2020 Masoumeh Akbari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.