Research Article  Open Access
Masoumeh Akbari, "Characterization and GoodnessofFit Test of Pareto and Some Related Distributions Based on NearOrder Statistics", Journal of Probability and Statistics, vol. 2020, Article ID 4262574, 9 pages, 2020. https://doi.org/10.1155/2020/4262574
Characterization and GoodnessofFit Test of Pareto and Some Related Distributions Based on NearOrder Statistics
Abstract
In this paper, a new definition of the number of observations near the th order statistics is developed. Then some characterization results for Pareto and some related distributions are established in terms of mass probability function, first moment of these new counting random variables, and using completeness properties of the sequence of functions . Finally, new goodnessoffit tests based on these new characterizations for Pareto distribution are presented. And the power values of the proposed tests are compared with the power values of wellknown tests such as Kolmogorov–Smirnov and Cramervon Mises tests by Monte Carlo simulations.
1. Introduction
Let order statistics be the nondecreasing order of independent and identically distributed (iid) random variables with cumulative distribution function (cdf) . The order statistics have important roles in different areas of statistics and probability. In reality, some kinds of order statistics are more applicable. For example, in actuarial science, the distribution of the minimum of the two lifespans of the couple is important for insurance policy to make decisions. In industries, specifically in reliability and survival analysis, order statistics are used to solve problems. Meteorology, hydrology, and so on are other fields of applications of order statistics. Interested readers can study the detail of theory and application of order statistics, for instance, in Arnold et al. [1]. Let be discrete cdf. For the first time, Eisenberg et al. [2] defined the quantity as the number of winners in a golf competition. They studied sufficient conditions under which converges to 1. After that Pakes and Steutel [3] considered similar notions for continuous cdf as follows:where is a constant. In fact, counts the number of observations in the lefthand neighbourhood of the sample maximum with fixed distance “.” Later, was developed to the number of observations near the thorder statistics as follows:where the support of is . Similarly, the number of observations in the righthand neighbourhood of the th order statistics was defined as
The support of is . The probability mass function (pmf) of is as follows:where and is the cdf of . Also, the pmf of is given bywhere . For more details, one can refer to Dembińska et al. [4].
More results of the description of their distributions, asymptotic properties, and their generalization have been investigatede.g., Pakes and Li [5], Li [6], Pakes ([7,8]), Balakrishnan and Stepanov ([9,10]), Dembińska et al. [4], and Dembińska ([11–13]). So far, few researchers have addressed the issue of statistical inference based on nearorder statistics, e.g., Müller [14], Hashorva and Hüsler [15], Akbari et al. [16], and Akbari and Akbari [17]. In the present paper, a new version of nearorder statistics is first defined. Then some characterization results as a statistical tool in goodnessoffit (GOF) test for some continuous distributions are obtained. The results of this paper are organized as follows. Section 2 contains preliminary results. The characterization results of the paper are included in Section 3. Finally, in Section 4 are introduced two tests for goodnessoffit tests for Pareto distribution. The critical values of the proposed test statistics are computed by Monte Carlo simulations. Also, their power is compared with those computed by wellknown tests such as Kolmogorov–Smirnov and Cramer–von Mises tests by simulations. All simulations are carried out by using R 3.6.3 and with 10000 replications.
2. Preliminary Results
Let be iid random variables from continuous cdf with support , and be one of the distribution functions Pareto, uniform, or power function. Constructing characterizations for such by the pmfs or moments of some functions of random variables is a common way. But it is not possible by counting random variables as the number of observations on the locationtype neighbourhood of the certain order statistics defined in equations (4) and (5), because their pmfs do not have a closedform expression. Therefore, new types of number of observations falling within the lefthand and righthand of neighbourhood of the specific order statistics, as an extension to scaletype neighbourhood, are introduced, respectively, as follows:where andwhere .
Proposition 1. Using the same arguments given in Dembińska et al. [4], it is concluded that the pmfs of new counting random variables and are the same as the pmfs of and , respectively, with and , that is,
On the other hand, by simple algebra calculations, the first moment of and can be derived, respectively, as follows:
Here, some results as examples of special cases of Pareto and power function distributions are reported that will be useful for obtaining further results in the next section.
Example 1. Let be iid random variables from Pareto (, ). So their survival distribution functions is given byThen from equation (9), the pmf of related to this sequence is as follows:Equation (13) shows that has binomial pmf with parameters and . Thus,which does not depend on scale paremeter .
Example 2. It is wellknown that if ∼ Pareto (, ), then random variable has power function distribution with following cdf:The notation power (, ) is used for power function distribution with parameters and . It is also called generalized uniform distribution because it is standard uniform cdf at and . From (8), the pmf of for random variables that are distributed as power (, ) is concluded asAccording to (16), has binomial pmf with parameters and success probability . SoAs we know, the uniform distribution function on interval is a special case of power (1,1) with following cdf:Therefore from (16), the pmf of when , equalsand its expectation is
3. Characterizaion Results
In this section, some characterization results based on distributional properties of nearorder statistics and for some continuous distributions are established in terms of property of sequence of complete functions. Thus, in the sequel, some notions and theorems related to this theory are reminded.
Definition 1. A sequence of elements of a Hilbert space is called complete if the only element which is orthogonal to every is the null element, that isimplies null.
The notation denotes the inner product of . In the present paper, the Hilbert space with the following inner product being considered:where and are realvalued square integrable functions on . One of the sequences of complete functions in is which is used in this paper.
The following theorem is known as Müntz theorem that states the necessary and sufficient condition for completeness of the subsequence .
Theorem 1 (Higgins [18], page 95). Sequence in is complete if and only if
For more details about Hilbert space and complete sequences, refer to Higgins [18]. Pareto is one of the distributions that have many applications in economics and actuturial sciences. So far, a lot of properties and characterization of it based on order statistics or their functions have been obtained, for example, Lee and Chang [19], Afify [20], Ahsanullah and Shakil [21], Ahsanullah et al. [22], and Nofal and El Gebaly [23]. In the following theorem, some characterizations for Pareto law in terms of are established.
Theorem 2. Let be iid random variables with continuous cdf whose support is . is Pareto () cdf, if and only if one of the following conditions holds:(a)There exists such that for all and for , we have(b)For all , and a fixed , we have
Proof. If sequence has Pareto () cdf, by the use of equations (13) and (14), one can easily obtain parts (a) and (b). Let part (a) hold. Using pmf of and the assumptions of (a), the equality in (a) can be rewritten asThe righthand side of equation (26) can be expressed asOn the other hand, replacing with , the equality in (26) can be stated asTaking the change of variable in the lefthand side of equation (28), it is deduced:By the assumption “” and after some algebra simplifications in the aforementioned equality, it is concluded thatSince and , by Minkowski’s inequality, we haveIf (30) holds for all , by the completeness property of the sequence , the following identity can be derivedHence,By taking in (33), it can be rewritten asIf (34) holds for all and , by the use of the method of solution given in Aczél [24], it is concluded that function is the genaral solution of (34). Because and is a survival distribution function, the constant will be . So, the proof is completed.
Suppose that part (b) holds. Then from equation (11), it is deduced thatSince , the last equality, after some simplifications, can be expressed asThe rest of the proof is similar to the proof of part (a).
So far, some results of characterization of power function distribution have been obtained. For example, it was characterized by Ahsanullah et al. [25] through lower records. Also Khan and Khan [26] and Lim and Lee [27] characterized it based on dependency property of lower records. Tavangar [28] presented a characterization of it using dual generalized order statistics. Now, in the next theorem new characterization results of power function distribution are proved.
Theorem 3. Suppose that are iid continuous random variables from cdf with support . Then ’s have power function distribution with cdf (15) if and only if one of the following statements holds.(a)For all and , there exists such that for and some , we have(b)For a fixed and for all and some , we have
Proof. By supposing ’s have power function distribution with cdf (15), from equations (7) and (16), parts (a) and (b) can be easily concluded. Let condition (a) be satisfied, thenSince and , the quantity belongs in by Minkowski’s inequality. The completeness property of the sequence and equation (39) result inThis is equivalent toTaking the change of variable in (41) givesThe function is the general solution of above functional equation. This completes the proof of (a). In a similar way, if (b) holds, one can easily prove that the parent population is power function distribution.
The results of Theorem 3 can also be observed directly from Theorem 2 by noticing that ∼ power (, ) if and only if ∼ Pareto (, ). Therefore, for where with superscript presents the number of observations near the thorder statistics related to sequence. According to relationship between distributions of power function and standard uniform that is mentioned before, from Theorem 3, the following results without proof are stated.
Corollary 1. Let be iid continuous random variables from cdf that is supported on . Then, is standard uniform distribution if and only if one of the following statements holds:(a)For all and , there exists such that for , we have(b)For a fixed and for all and , we have
Remark 1. According to Theorem 1, it is not necessary that the assumptions of Theorem 2 hold for all “” or “.” This fact is also true for Theorem 3 and Corollary 1. So, the results of Theorems 2 and 3 and Corollary 1 hold if their assumptions provide for any increasing subsequence or such that the equality (23) holds.
Remark 2. Some characterization results of twoparameter exponential distribution have been obtained based on counting random variable by Akbari and Akbari [17]. Their results of characterizations in Section 2 can also be derived directly by Theorems 2 and 3. For considering this claim, suppose be a random variable having twoparameter exponential distribution with parameters (, ), denoted by Exp (), and the following cdf:Since ∼Exp () if and only if ∼Pareto (), the following relationship holds between and . For ,Also, ∼Exp () if and only if ∼power (). So for ,
4. GoodnessofFit Test Results
So far, many results of GOF tests for different distributions using characterization results have been obtained. For example, Rizzo [29], Obradovi et al. [30], and Volkova [31] obtained GOF tests for Pareto distribution in terms of its different characteristic properties. According to Nikitin [32], tests based on characterization results are usually more efficient than other tests, because the unique feature of the same distribution has been used in constructing test statistics.
Let be iid random variables from continuous distribution function . For testing null hypothesis against for some , are presented two test statistics based on characterization results of Theorem 2. According to part (a) of this theorem, the null hypothesis will be rejected if there exists “” such that for all and , equation (24) is not satisfied, i.e., the value of quantityto be large. From (9) and assumption , the above expression is equivalent to
Replacing by , the emprical distribution function, a point estimator of (50), can be considered as
Therefore, the test statistic that its large value will be rejected, , is given by
With the same discussion, the another test statistic based on the part (b) of Theorem 2 can be aswhere .
It is obvious that and are free of scale parameter of Pareto distribution and their large values reject .
In the rest of this section, the power values of two test statistics and will be compared with wellknown tests, namely, Kolmogorov–Smirnov and Cramer–von Mises tests which their statistics are, respectively,where and , and
Since Pareto distribution is long tail, following alternative distributions that are long tail on the righthand side are considered for comparison of power values of statistics , and .(i)The Weibull distribution with density , denoted by (ii)The gamma distribution with density , denoted by (iii)The lognormal distribution with density , denoted by (iv)The uniform distribution with density , denoted by (v)The loggamma distribution with density , denoted by
Since it is not easy to find the null distribution of , and , Monte Carlo simulations with 10000 replications are used for calculating their power values and critical values at 5 percent significance level. Tables 1 and 2 show the results for null distribution Pareto (1,2) and Pareto (2,2), respectively. Because the statistics and have the parameter , one can choose an optimal to maximize corresponging power values. So, these values are calculated and shown in the tables. Unfortunately there is no accurate method to find these values and depend on the support of null and alternative distributions.


The values in parentheses in the tables refer to estimated significance level. According to the results of the two tables, it is concluded that proposed tests are always more powerful than the other tests. Even in small sample size, the proposed tests perform very well and better than the others.
In the following example, it is used real data set to illustrate how the proposed tests can be applied.
Example 3 (As an application to real data). The following data represent the time for break down of a type of electrical insulating material subject to a constantvoltage stress (Nelson [33]).This data recently were used by Tiku and Akkaya [34]. They established that the null hypothesis where data come from exponential distribution cannot be rejected at 10 percent significance level. It is obvious that data come from a distribution with long tail on the righthand side. So, Pareto distribution can be another suggested distribution for such data. For testing versus , first the parameters of Pareto distribution are estimated with shape parameter and scale parameter . Then, based on data, the values of the proposed statistics (with ), Kolmogorov–Smirnov, and Cramer–von Mises statistics have been obtained as follows:Also, with 10000 replications based on estimated Pareto distribution, the critical values at 10 percent significance level have been calculated, respectively,Hence, the null hypothesis that data come from Pareto distribution cannot be rejected using this data.
Data Availability
No data are included in the study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
References
 B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics, SIAM, Philadelphia, PA, USA, 2008.
 B. Eisenberg, G. Stengle, and G. Strang, “The asymptotic probability of a tie for first place,” The Annals of Applied Probability, vol. 3, no. 3, pp. 731–745, 1993. View at: Publisher Site  Google Scholar
 A. G. Pakes and F. W. Steutel, “On the number of records near the maximum,” Australian Journal of Statistics, vol. 39, no. 2, pp. 179–192, 1997. View at: Publisher Site  Google Scholar
 A. Dembińska, A. Stepanov, and J. Wesolowski, “How many observations fall in a neighbourhood of an order statistic?” Communications in Statistics Theory and Methods, vol. 36, pp. 1–17, 2007. View at: Google Scholar
 A. G. Pakes and Y. Li, “Limit laws for the number of near maxima via the Poisson approximation,” Statistics & Probability Letters, vol. 40, no. 4, pp. 395–401, 1998. View at: Publisher Site  Google Scholar
 Y. Li, “A note on the number of records near the maximum,” Statistics & Probability Letters, vol. 43, no. 2, pp. 153–158, 1999. View at: Publisher Site  Google Scholar
 A. G. Pakes, “The number and sum of nearmaxima for thintailed populations,” Advances in Applied Probability, vol. 32, no. 4, pp. 1100–1116, 2000. View at: Publisher Site  Google Scholar
 A. G. Pakes, “Numbers of observations near order statistics,” Australian & New Zealand Journal of Statistics, vol. 51, no. 4, pp. 375–395, 2009. View at: Publisher Site  Google Scholar
 N. Balakrishnan and A. Stepanov, “A note on the number of observations near an order statistic,” Journal of Statistical Planning and Inference, vol. 134, no. 1, pp. 1–14, 2005. View at: Publisher Site  Google Scholar
 N. Balakrishnan and A. Stepanov, “Asymptotic properties of numbers of near minimum observations under progressive typeII censoring,” Journal of Statistical Planning and Inference, vol. 138, no. 4, pp. 1010–1020, 2008. View at: Publisher Site  Google Scholar
 A. Dembińska, “Asymptotic properties of numbers of observations in random regions determined by central order statistics,” Journal of Statistical Planning and Inference, vol. 142, pp. 516–528, 2012. View at: Google Scholar
 A. Dembińska, “Limit theorems for proportions of observations falling into random regions determined by order statistics,” Australian & New Zealand Journal of Statistics, vol. 54, pp. 199–210, 2012. View at: Google Scholar
 A. Dembińska, “Asymptotic normality of numbers of observations in random regions determined by order statistics,” Statistics, vol. 48, pp. 508–523, 2014. View at: Google Scholar
 S. Müller, “Tail estimation based on numbers of near mextremes,” Methodology and Computing in Applied Probability, vol. 5, pp. 197–210, 2003. View at: Google Scholar
 E. Hashorva and J. Hüsler, “Estimation of tails and related quantities using the number of nearextremes,” Communications in Statistics: Theory and Methods, vol. 34, no. 2, pp. 337–349, 2005. View at: Publisher Site  Google Scholar
 M. Akbari, M. Fashandi, and J. Ahmadi, “Characterizations based on the numbers of nearorder statistics,” Statistical Papers, vol. 57, no. 1, pp. 21–30, 2016. View at: Publisher Site  Google Scholar
 M. Akbari and M. Akbari, “Some applications of nearorder statistics in twoparameter exponential distribution,” Journal of Statistical Theory and Applications, vol. 19, no. 1, pp. 21–27, 2020. View at: Publisher Site  Google Scholar
 J. R. Higgins, Completeness and Basis Properties of Sets of Special Functions, Cambridge University Press, New York, NY, USA, 2004.
 M. Y. Lee and S. K. Chang, “Characterizations based on the independence of the exponential and Pareto distributions by record values,” Journal of Applied Mathematics and Computing, vol. 18, pp. 497–503, 2005. View at: Google Scholar
 E. E. Afify, “Characterization of mixtures from exponential and Pareto distributions using left truncated moments,” IJSS, vol. 5, pp. 1–18, 2006. View at: Google Scholar
 M. Ahsanullah and M. Shakil, “A note on the characterizations of Pareto distribution by upper record values,” Communications of the Korean Mathematical Society, vol. 27, no. 4, pp. 835–842, 2012. View at: Publisher Site  Google Scholar
 M. Ahsanullah, M. Shakil, and B. M. G. Kibria, “Characterizations of continuous distributions by truncated moment,” Journal of Modern Applied Statistical Methods, vol. 15, no. 1, pp. 316–331, 2016. View at: Publisher Site  Google Scholar
 Z. M. Nofal and Y. M. El Gebaly, “New characterizations of the Pareto distribution,” Pakistan Journal of Statistics and Operation Research, vol. 13, no. 1, pp. 63–74, 2017. View at: Publisher Site  Google Scholar
 J. Aczél, Lectures on Functional Equations and Their Applications, Academic Press, New York, NY, USA, 1966.
 M. Ahsanullah, M. Shakil, and B. M. Golam Kibria, “Characterization of power function distribution based on lower records,” Probstat Forum, vol. 6, pp. 68–72, 2013. View at: Google Scholar
 M. I. Khan and M. A. R. Khan, “Characterization of generalized uniform distribution based on lower record values,” Probstat Forum, vol. 10, pp. 23–26, 2017. View at: Google Scholar
 E. H. Lim and M. Y. Lee, “A characterization of the power function distribution by independent property of lower record values,” Journal of the Chungcheong Mathematical Society, vol. 26, pp. 269–273, 2013. View at: Publisher Site  Google Scholar
 M. Tavangar, “Power function distribution characterized by dual generalized order statistics,” JIRSS, vol. 10, pp. 13–27, 2011. View at: Google Scholar
 M. L. Rizzo, “New goodnessoffit tests for Pareto distributions,” Astin Bulletin, vol. 39, no. 2, pp. 691–715, 2009. View at: Publisher Site  Google Scholar
 M. Obradovi, M. Jovanovic, and B. Milosevic, “Goodnessoffit tests for Pareto distribution based on a characterization and its asymptotic,” Statistics, vol. 49, pp. 1026–1041, 2015. View at: Google Scholar
 K. Y. Volkova, “On asymptotic efficiency of goodnessoffit tests for the Pareto distribution based on its characterization,” Filomat, vol. 29, pp. 2311–2324, 2015. View at: Google Scholar
 Y. Nikitin, “Test based on characterizations, and their efficiencies: a survey,” Acta et Commentationes Universitatis Tartuensis de Mathematica, vol. 21, pp. 3–24, 2017. View at: Publisher Site  Google Scholar
 W. B. Nelson, “Statistical methods for accelerated life test data—the inverse power law model,” Tech. Rep., General Electric, Schenectady, NY, USA, 1970, Technical report 71C011. View at: Google Scholar
 M. L. Tiku and A. D. Akkaya, Robust Estimation and Hypothesis Testing, New Age International Pvt. Ltd., New Dehli, India, 2004.
Copyright
Copyright © 2020 Masoumeh Akbari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.