Sequential Test for a Mixture of Finite Exponential Distribution

Al-Moisheer, A.S.

doi:https://doi.org/10.1155/2021/6625853

Journal of Mathematics

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 6625853 | https://doi.org/10.1155/2021/6625853

Sequential Test for a Mixture of Finite Exponential Distribution

A.S. Al-Moisheer¹

Academic Editor: Markos Koutras

Received21 Dec 2020

Revised15 Mar 2021

Accepted30 Mar 2021

Published19 Apr 2021

Abstract

Testing the number of components in a finite mixture is considered one of the challenging problems. In this paper, exponential finite mixtures are used to determine the number of components in a finite mixture. A sequential testing procedure is adopted based on the likelihood ratio test (LRT) statistic. The distribution of the test statistic under the null hypothesis is obtained using a resampling technique based on B bootstrap samples. The quantiles of the distribution of the test statistic are evaluated from the B bootstrap samples. The performance of the test is examined through the empirical power and application on two real datasets. The proposed procedure is not only used for testing the number of components but also for estimating the optimal number of components in a finite exponential mixture distribution. The innovation of this paper is the sequential test, which tests the more general hypothesis of a finite exponential mixture of components versus a mixture of components. The special case of testing an exponential mixture of one component versus two components is the one commonly used in the literature.

1. Introduction

The exponential distribution, which is analytically very simple, plays an important role in reliability and lifetesting analogues to the normal distribution in other areas. Consequently, the exponential distribution became a basic model to research associated with experiments on life expectancy. Applications of the exponential distribution include designing acceptance sampling plans [1], estimation of reliability in multicomponent stress-strength [2], and construction of multivariate control chart [3]. Also, neutrosophic statistics is applied when the data have uncertain parameters or values [4, 5]. A reason to study this distribution in mixtures is related to the lack of memory property.

Many authors have discussed mixture models such as Everitt and Hand [6]; Titterington et al. [7]; McLachlan and Basford [8]; Lindsay [9]; McLachlan and Krishnan [10]; and McLachlan and Peel [11].

Let be a random sample, arising from a mixture of finite exponential distribution (MFED), whose density iswhere is the number of exponential components, , and represents the density function of the component.

Inference on the number of components can be conducted through statistical tests such as likelihood ratio tests. Some papers have dealt with bootstrapping the LRT such as McLachlan [12] and Feng and McCulloch [13] who used the bootstrap resampling for the number of one normal distribution against a mixture of two normal distributions. Also, Feng and McCulloch [13] who noted the bootstrap resampling for the number of normal mixture with difference variances is a preferred method. Seidel, Mosler and Alker [14, 15], who used a mixture of two exponential distributions; and Sultan, Ismail and Al-Moisheer [16]; who used a mixture of two inverse Weibull distributions. The discrete Poisson distribution was used by Karlis and Xekalaki [17]. Some criteria are used to choose the number of components in finite mixtures models such as McLachlan and Peel [11]. Various authors have suggested the simplest form of testing in LRT for a single component against a two-component model. Here, the test procedure is proposed for components against components.

This paper is arranged into six sections. In Section 2, an algorithm is presented to determine the number of finite exponential components by using bootstrapping the LRT in R software packages. Section 3 contains the simulation results from computing the quantiles for an estimated number of finite exponential components using a sequential test. In Section 4, we evaluate the power of the sequential test when determining the number of finite exponential components. Criteria based on the likelihood are applied to determine (estimate) the number of components in finite mixtures models such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the Hannan–Quinn information criterion (HQIC), and the consistent Akaike information criterion (CAIC). In Section 5, the sequential test procedures are presented. Finally, in Section 6, the conclusion results are shown for the sequential testing number of finite exponential components.

2. Determining the Number of Components in the Finite Exponential Mixture

In this section, we use a sequential test to specify the number of components in the finite exponential mixture by using a resampling procedure called bootstrap. McLachlan [12] and Feng and McCulloch [13] discussed the idea of bootstrapping the LRT. The general method for determining the number of components is based on the LRT. The LRT statistic is used as appropriate test statistic for testing hypotheses. The test statistic is defined as , where represents the ratio between the maximized likelihood functions under the null hypothesis and the alternative hypothesis , respectively, . Equivalently, the test statistic can be written as , where is the maximum likelihood estimator MLE for the parameter .

Consider the hypothesis : the number of components in the exponential finite mixture is against : the number of components in the exponential finite mixture is . The procedure of testing is sequential for using the LRT statistic. Bootstrapping the LRT is as follows.

Set .(1).Find the MLE of the parameters , of the finite exponential mixture for and and calculate the LRT statistic which is referred to as . For the case , the MLE of , , the sample mean.(2)Generate a bootstrap sample of size ( is the sample size) from the exponential mixture of components and calculate the value of after obtaining the MLEs of under and . The EM algorithm for a finite mixture of exponential distribution is used as mentioned in [15], as follows: where and are represents in equations (1) and (2), respectively.(3)The process is repeated independently a number of times .(4)The value of the order statistics of the replications of can be taken as an estimate of the quantiles of the order .(5)The bootstrap replications can also be used to provide a test of approximate size .(6)If the order statistic of the replications of , then the null hypothesis is rejected and we set and go to step 1. Otherwise, the optimal number of components is and the test is terminated. The bootstrap replications can also be used to provide a test of an approximate size where . For more details, see McLachlan and Peel [11].

3. Simulation Results

In order to find the LRT under the null hypothesis , we use the simulated data for the sequential test from a mixture of univariate exponential distributions. Accordingly, we use the package for R, which provides a set of functions to analyze a variety of finite mixture models. The package is used to generate a random sample for a mixture of univariate exponential distributions. Then, we require the MLE of the mixing distribution. To find the best fit of , the function can be used to find the best fit for the model.

The number of the finite exponential components test is applied to choose the number of components . To calculate the quantiles of the LRT tests, we simulate the null distribution of for the sample sizes at , and 100 according to the stopping criteria described in [14] by using as the level of accuracy. Each distribution is generated for the parameters in the model with 500 bootstrap samples. Tables 1–3 present the quantiles at the significance levels , , and . The test always rejected for each and the number of components at sample size , and the test was repeated for five times, as shown in Tables 2 and 3. The epitome from the simulation is at , sample sizes are , and the choices of parameters are (0.8, 0.15, 0.5, 2, 4). Then, the best estimates for the optimal values for are , respectively. Further, simulation results depend on the following factors.

As shown in Tables 1–3, to calculate the quantiles, when the sample sizes increase, so do the values of the quantiles; for example, in Table 1, at in the choice of parameter (0.9, 0.35, 5) for sample size , we get the result of for the five repetitions and once at the significant level of , while for , we get the results for the significant levels of . The same goes for and (see Figure 1; see also , which shows how the sample size affects the value of the quantiles and the level of acceptance for ). With respect to the initial values of the parameters, as the initial vector consists of the simulation results reveal that when there is a large difference between the initial of the parameter the number of rejected ones decreases and get the quantile results for high values of . This is clear at with the parameters and the maximum number of accepted quantiles is in , while in the parameters choices , the accepted values of the LRT at the levels of significance are and 0.5 for sample size . For the mixing proportion, when the initial value of or the sum of is closer to 1, the number of rejected ones decreases. This is clear when we compare the results of the parameters at . Then, the accepted values of the level of significance are up to for a large sample size . On the other hand, for the mixing proportions , the accepted values of are up to 0.5 in the same sample. Finally, according to the increasing number of components and the increasing levels of significance (), it is obvious in the tables for the LRT that the maximum level of significance () at is 0.9 and at it is 0.5, even though at , it is 0.1.

4. The Power of the Sequential Test

The empirical power of the sequential test for the components against is evaluated for sample sizes , and 100 and . The empirical power is defined as the proportion of times was rejected when the data were generated under . The power is simulated for 500 bootstrap samples and the different choices of parameters. Also, the power is calculated for the significance levels at , and .

For each case, we study the effect of increasing the distance between the parameters on the power of the test. Also, we study the effect of increasing the mixing proportion and the sample size for the test. The power results are shown in Tables 4–6. For each case, when the sample size increases, the powers improve for every component . To test the components against , the power is increased for large sample sizes, starting from and over. For the components against for small sample sizes of , the power is always decreased. Performances of empirical power are affected by the sample sizes and not the reverse (see Figures 2–4).

5. Application

The sequential test procedure is applied in two real data as follows.

5.1. Application (1)

The data considered in this application are given by Maswadah [18]. It represents the maximum flood levels (in millions of cubic feet per second) of the Susquehanna River in Harrisburg, Pennsylvania, over 20 four-year periods (1890–1969). The observations are as follows:0.654, 0.613, 0.315, 0.449, 0.297, 0.402, 0.379, 0.423, 0.379, 0.324, 0.269, 0.740, 0.418, 0.412, 0.494, 0.416, 0.338, 0.392, 0.484, 0.265.

5.2. Application (2)

It is an application of the sequential mityres of an exponential test for fitting exponential mixtures. According to Smith and Naylor [19], the following data represent the strength of 1.5 cm glass fibers.

Data n = 63.0.55, 0.93, 1.25, 1.36, 1.49, 1.52, 1.58, 1.61, 1.64, 1.68, 1.73, 1.81, 2.00, 0.74, 1.04, 1.27, 1.39, 1.49, 1.53, 1.59, 1.61, 1.66, 1.68, 1.76, 1.82, 2.01, 0.77, 1.11, 1.28, 1.42, 1.50, 1.54, 1.60, 1.62, 1.66, 1.69, 1.76, 1.84, 2.24, 0.81, 1.13, 1.29, 1.48, 1.50, 1.55, 1.61, 1.62, 1.66, 1.70, 1.77, 1.84, 0.84, 1.24, 1.48, 1.30, 1.51, 1.55, 1.61, 1.63, 1.67, 1.70, 1.78, 1.89.

We apply the sequential testing of the number of components in an exponential finite mixture for the above two real datasets for sample sizes and 63, respectively. The sequential results for the two applications are given in Tables 7 and 8. Column 1 in Tables 7 and 8 contains the number of components in finite mixtures of exponentials. Column 2 contains the values of the LRT statistics for testing versus . The test’s p values are obtained between 0 and 1 by using 500 bootstrap samples, as described previously. The last four columns contain some information criteria that are used to choose the number of components in finite mixtures models, such as AIC, BIC, HQIC, and CAIC. These criteria also agree with the results and the values. The values are obtained through simulation under of the LRT. Our results in Tables 7 and 8 lead to the selection of the mixture model with one component. It can also be seen from Tables 7 and 8 that the LRT increases when the number of components decreases, while the information criteria increase as the number of components increases. Briefly, the best mixture models at because it has the largest value and the minimum values for the four criteria that are used (Algorithm 1).

6. Conclusion

In this paper, the sequential testing of the number of components in exponential finite mixture is discussed. Simultaneously, the optimal number of components for a finite exponential mixture is determined to provide the appropriate fit with the data. A resampling approach to determine the number of components is used via B bootstrap samples. Bootstrap samples are generated from the finite exponential components under . The value of is evaluated for each bootstrap sample. This process is repeated for 500 bootstraps to obtain the order statistic as an estimate of the quantile. The power results for the estimated number of finite exponential components are computed. Two applications of real data are used to illustrate the sequential test. Thus, the innovation in this sequential test is that it permits the testing of the hypothesis of components in an exponential mixture against components, along with the determination of the optimal exponential mixture. It thus provides a general method than the one commonly used in exponential mixtures which focuses on testing one component versus a mixture of two components. The importance of this sequential test lies in cluster analysis and other applications. Finally, it is clear that our sequential test, which was applied to finite exponential mixtures, can be applied to finite mixture models from any family of mixtures.

	Put
	Step 1: find the MLEs of the parameters of the finite exponential components for and , respectively, by using the code in R.
	For

	Other

	and calculate the LRT statistic, say .
	Step 2: declare the initial various variables , .
	Step 3: generate a random sample of size , for the mixture of univariate exponential distributions ,


	Step 4: declare the function LRT to calculate ,, and then calculate the LRT by using the EM algorithm equation for finite exponential components, as mentioned in [15].
	Step 5: simulate 500 bootstrap samples of size with initial vector parameters ,, and for each bootstrap sample, calculate , for the LRT using the boot package functions in R.


	Step 6: find the control variate estimates from a bootstrap output object. Estimate the quantiles using the linear approximation as a control variate.

	Step 7: estimate the p values in result

Data Availability

The data used to support the findings of this study were obtained from Maswadah [18] and Smith and Naylor [19].

Conflicts of Interest

The author declares that there are no conflicts of interest.

References

W. Gui and M. Aslam, “Acceptance sampling plans based on truncated life tests for weighted exponential distribution,” Communications in Statistics - Simulation and Computation, vol. 46, no. 3, pp. 2138–2151, 2017.
View at: Publisher Site | Google Scholar
G. S. Rao, M. Aslam, and O. H. Arif, “Estimation of reliability in multicomponent stress-strength based on two parameter exponential weibull distribution,” Communications in Statistics: Theory and Methods, vol. 46, no. 15, pp. 7495–7502, 2017.
View at: Google Scholar
N. Khan, M. Aslam, M. S. Aldosari, and C.-H. Jun, “A multivariate control Chart for monitoring several exponential quality characteristics using EWMA,” IEEE Access, vol. 6, no. 1, pp. 70349–70358, 2018.
View at: Publisher Site | Google Scholar
M. Aslam and O. H. Arif, “Testing of grouped product for the weibull distribution using neutrosophic statistics,” Symmetry, vol. 10, no. 9, p. 403, 2017.
View at: Google Scholar
M. Aslam, “Design of sampling plan for exponential distribution under neutrosophic statistical interval method,” IEEE Access, vol. 6, no. 1, pp. 64153–64158, 2018.
View at: Publisher Site | Google Scholar
B. S. Everitt and D. J. Hand, Finite Mixture Distribution, Chapman & Hall, London, UK, 1981.
D. M. Titterington, A. F. M. Simth, and U. E. Makov, Statistical Analysis of Finite Mixture Distribution, Wiley & Sons, Chichester, UK, 1985.
G. J. Maclachlan and K. E. Basford, Mixture Models: Applications to Clustering, Marcel Dekker, New York, NY, USA, 1988.
B. G. Lindsay, Mixture Models: Theory, Geometry and Applications, The Institute of Mathematical Statistics, Hayward,CA, USA, 1995.
G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, John Wiley & Sons, New York. NY, USA, 1997.
G. J. McLachlan and D. Peel, Finite Mixture Models, John Wiley & Sons, New York, NY, USA, 2000.
G. J. McLachlan, “On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture,” Applied Statistics, vol. 36, no. 3, pp. 318–324, 1987.
View at: Publisher Site | Google Scholar
Z. D. Feng and C. E. McCulloch, “Using bootstrap likelihood ratios in finite mixture models,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 3, pp. 609–617, 1996.
View at: Publisher Site | Google Scholar
W. Seidel, K. Mosler, and M. Alker, “A cautionary note on likelihood ratio tests in mixture models,” Annals of the Institute of Statistical Mathematics, vol. 52, no. 3, pp. 481–487, 2000a.
View at: Publisher Site | Google Scholar
W. Seidel, K. Mosler, and M. Alker, “Likelihood ratio tests based on subglobal optimization: a power comparison in exponential mixture models,” Statistical Papers, vol. 41, no. 1, pp. 85–98, 2000b.
View at: Publisher Site | Google Scholar
K. S. Sultan, M. A. Ismail, and A. S. AL-Moisheer, “Testing the number of components of the mixture of two inverse Weibull distributions,” International Journal of Computer Mathematics, vol. 86, no. 4, pp. 693–702, 2009.
View at: Publisher Site | Google Scholar
D. Karlis and E. Xekalaki, “On testing for the number of components in a mixed Poisson model,” Annals of the Institute of Statistical Mathematics, vol. 51, no. 1, pp. 149–162, 1999.
View at: Publisher Site | Google Scholar
M. Maswadah, “Conditional confidence interval estimation for the Inverse Weibull distribution based on censored generalized order statistics,” Journal of Statistical Computation and Simulation, vol. 73, no. 12, pp. 887–898, 2003.
View at: Publisher Site | Google Scholar
R. L. Smith and J. C. Naylor, “A comparison of maximum likelihood and bayesian estimators for the three- parameter weibull distribution,” Applied Statistics, vol. 36, no. 3, pp. 358–369, 1987.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 A.S. Al-Moisheer. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

328

Downloads

621

Citations