Abstract

The exponentiated generalized Gull alpha power exponential distribution is an extension of the exponential distribution that can model data characterized by various shapes of the hazard function. However, change point problem has not been studied for this distribution. In this study, the change point detection of the parameters of the exponentiated generalized Gull alpha power exponential distribution is studied using the modified information criterion. In addition, the binary segmentation procedure is used to identify multiple change point locations. The assumption is that all the parameters of the EGGAPE distributions are considered changeable. Simulation study is conducted to illustrate the power of the modified information criterion in detecting change point in the parameters with different sample sizes. Three applications related to COVID-19 data are used to demonstrate the applicability of the MIC in detecting change point in real life scenario.

1. Introduction

The concept of change point is important in statistical analysis since it helps to identify the points at which a time series’ distribution changes. Change point Analysis is of great interest in real-life phenomena such as in health science, finance, and survival analysis. The contents of change point inference include the aspect of determining the existence of a change point and then estimating the number and the positions of the change points. Since the inception of the concept of change point analysis, many studies have been conducted. Sen and Srivastava [1, 2] studied and came up with a statistic for detecting change in the mean of variables that were characterized by a normal distribution and derived the asymptotic, exact distribution. Change point in a binomial probability model, the power of likelihood ratio, and cumulative sum tests were investigated in [3]. In the context of distributions, Ngunkeng and Ning [4] did a study on change point problem for generalized lambda distribution. For an exponential distribution characterized by repeated values, change point identification was carried out in [5]. Hassan et al. [6] obtained weighted power Lomax distribution and its length biased version. Shrahili et al. [7] discussed the alpha power moment exponential model with applications to biomedical science. Chen and Arjun [8] did an extensive study on the statistical change point in the parametric context and gave applications related to the fields of finance, medicine, and genetics. Change point detection utilizing the information approach considering regular models was explored in [9]. Arellano-Valle et al. [10] considered the problem of change point for the Skew normal distribution using the Bayesian approach. Alghamdi et al. [11] studied the Rayleigh Lomax distribution and used the information approach to identify potential change points in the parameters. Almetwally [12] did an extensive study incorporating odd Weibull inverse Topp–Leone distribution with applications to COVID-19 data. Recently, Ratnasingam[13] did an extensive study incorporating modified information approach and confidence distribution skew normal distribution.

In the area of statistics, the unknown time point when observations follow different distributions before and after the point is described as change point. Following the definition of change point, the description follows. Let be a series of independent random variables with CDF given as , respectively. The change point problem entails testing the null hypothesis as follows:

Against the alternative,where denotes the change points number and are the unknown locations of the change points that are to be estimated. Of importance, if are from the same parametric family, the change point problem turns out to be a test of the null hypothesis of the parameters of the population , stipulated asversus the alternative,

The Exponentiated Generalized Gull Alpha Power Exponential (EGGAPE) distribution is a recently developed distribution in [14]. It is flexible enough and can take various shapes of the hazard functions depending on the values of the shape parameters. The probability distribution of the EGGAPE distribution is given bywhere are the shape parameter and is the scale parameter. The case, where , is the exponential distribution, the case, where and , is the exponentiated exponential distribution, the case, where , is the exponentiated generalized exponential distribution. Several authors have investigated the change point problem for several distributions. ElSherpienyAlmetwally [15] introduced exponentiated generalized alpha power family. Jandhyala et al. [16] came up with change-point methodology that was used to identify changes in the parameters of the two-parameter Weibull distribution. The statistic they developed was the likelihood ratio test that was used to detected unknown changes in parameters, and the change points were located. Almongy et al. [17] discussed likelihood function for multicomponent stress-strength model under power Lomax distribution. Hafez et al. [18] studied likelihood of single and multiple ramp progressive stress with binomial removal. The application of the model was on temperature data.

Jarušková [19] did a study to test the presence of change point using the log-likelihood statistic in a three-parameter Weibull distribution. Ratnasingam[20] proposed a procedure that was built on the MIC and the confidence distribution in a three-parameter Weibull distribution for detecting and estimating changes. To identify and find changes in the parameters of a four-parameter EGGAPE distribution concurrently, we present a methodology based on the information approach, specifically modified information approach, and Schwarz information approach. The proposed method can be applied to a variety of parametric distributions as long as the regularity and Wald requirements are met.

The following is how the rest of the study is organized. In Section 2, we look at approaches based on the MIC and SIC for detecting simultaneous changes in all parameters. In Section 3, simulations for the scenarios will be run with a variety of parameter and sample size variations in order to examine the test’s power. Section 4 shows how the algorithm was applied to three COVID-19 datasets to demonstrate change point detection. The results and areas for additional research are presented in Section 5.

2. Methodology

2.1. Information Approach

This section presents the methodology applied to detect the possible change points. The modified information approach and the Schwarz information criterion are discussed. Change point problem generally usually involves the estimation of parameters and testing of hypothesis. To be more specific, the null hypothesis tested is that there is no change point against that there exists at least one change point which is the alternative hypothesis. The use of model selection criteria is one of the most popular methods for change point detection. The Schwarz information criterion (SIC) was developed in [21]. As pointed out in [9], the SIC technique does not take into account the model’s complexity, which might lead to redundancy in the parameter space. To address this shortcoming, Chen et al. [9] came up with the MIC technique that adjusted SIC penalty’s term to reflect the contributions of the change points’ locations to model complexity. Let be a sample chosen at random from a density function. The following is the SIC criterion:where is defined as the likelihood function of the model, n is the sample size, and is the number of parameters in the model. In this, we denote and to be the parameters before and after the change point. The symbol denotes the unknown change point location. When at least one change point is present, the SIC is as follows:where . Equation (7) does not treat the change position as a parameter, which could result in redundancy in the parameter space if the change happens near the end or beginning of the data. The MIC under the null hypothesis is given aswhere maximizes the log likelihood . The MIC criterion under the alternative hypothesis is defined as follows:where .

Ifchoose the model that has a change point, and the location of the specific change point is estimated by such that

The MIC test statistic, which is used to determine the statistical significance of a change point, is defined as follows:

Chen et al. [9] showed that, as ,in distribution under the null hypothesis. The SIC test statistic is given as

The asymptotic distribution of the statistic in (14) is the type I extreme value distribution.

2.2. MIC and SIC Detection Approach for EGGAPE Distribution

In this section, the study focuses on change point problem using and approaches to detect changes in parameters of the EGGAPE distribution defined in Equation (1). Let be a sequence of independently random variables from the EGGAPE distribution with scale parameter and shape parameters , and . The null hypothesis isversuswhere denotes the unknown location to be estimated.

For , the and are defined aswhere are MLEs of scale parameter and shape parameters , respectively, fitted to whole dataset.

The log likelihood function under is

To obtain the MLEs of , then we letandso that :

Partial derivatives of the log-likelihood function with respect to and equating them to zero are given:

The parameter estimates for are obtained by equating equations (22)–(25) to zero and solving the system of nonlinear equations.

Under , the and are defined, respectively, aswhere , and are the MLEs of , and , respectively, fitted to the first segment of data and , and are the MLEs of , and , respectively, fitted to the second segment of data.

3. Simulation

In this section, simulations are carried out to assess the test’s power in two scenarios: when there is a change point and when there is not. First, we conduct the simulation for SIC and MIC when there is change point.

3.1. Simulation Study: When There Is Change Point

In this section, change point problem of the scale and the shape parameters of the EGGAPE distribution were studied. To be able to calculate the statistic , and , the bbmle package developed in [22] was used to fit a dataset with EGGAPE distribution since the first derivatives of the , , and

We conduct simulations 1000 times under the with different values of the shape parameter and the scale parameter . The test statistic and are calculated and compared to the critical values corresponding to the significant level 0.05.

After rejecting, the null hypothesis, the powers of SIC and MIC with different sample sizes , and different change locations are shown in Tables 14. The EGGAPE parameters are changing as , , , and .

The purpose of the simulation power test is in order to verify the accuracy of detecting the change point at different locations. As indicated in Tables 14, the power increases as the change point location moves to the middle of the data. It can be observed that the MIC has high powers to detect the change point compared to SIC.

Compared to the power of the traditional SIC, MIC has a higher value for the EGGAPE distribution, as shown in Figures 1. It is clear that MIC has a higher power when the change point location is in the middle of the dataset. This is because of the penalty term in MIC which is different from the traditional value of 1 in SIC.

If the location of the change point is found at the start of the data and the end of the data, as or and ,

This is very close to SIC. However, when the change point is in the middle of the dataset, as ,

Then, this quadratic term will be canceled. When the change point is exactly the middle term and the penalty term of MIC will be smaller than that of SIC. It is easier to reject the null hypothesis and detect a change in the data when the information criterion gets smaller.

The main difference between SIC and MIC is that MIC has a higher power than SIC to detect the change when the changes happen in the middle of the dataset, as displayed in Figures 1 and 2.

The following conclusions can be made with respect to the simulation study when there is change point:(i)As change point location approaches middle of data, the power of the test increases(ii)When the difference between parameters increases, the power of the test increases(iii)As sample size increases, the power of the test also increases(iv)Since the MIC has a higher power than the SIC, then we use the MIC in the application of the real data in detecting a change points

3.2. Simulation Study: When There Is No Change Point

In this section, we conduct a simulation study to investigate the power of the test when there is no change point in the parameters of the distribution. EGGAPE parameters are not changing as , , , and .

We conduct simulations 1000 times under with different values of the shape parameter and the scale parameter . The results for both the SIC and the MIC are given in Tables 5 and 6 when and , respectively.

The plots of the figures when there is no change point are given in Figures 3 and 4.

The following conclusions can be made with respect to the simulation study when there is no change point. For a sample size of and ,(i)The power of the tests are low for both SIC and MIC(ii)Comparing the power of the test, when there is change point and when there is no change point, the power of the test is higher when there is change point, signifying that the test correctly identifies change point.

4. Application to COVID-19 Data

In this section, we introduced application to COVID-19 data for Italy, UK, and Mexico. More studies analyzed COVID-19 data [17, 2330].

4.1. Italy COVID-19 Data

This section explains the change point analysis of COVID-19 death rates data in Italy for a 59-day period from February 2 to April 25, 2020. https://COVID-19.who.int/ was the source of the data. Table 7 gives the data.

A time-series plot of the dataset is displayed in Figure 5. The mortality rate is calculated as

All the parameters are considered changeable.

To identify a change point in the dataset of the Italy Mortality rates, we apply the test statistics defined in Equation (10). The results are displayed in Table 8.

From Table 8, is rejected and conclude that the change point exists at MIC (40) which equals the mortality rate of 5.073 and reflects the date of 2020-04-06. Based on the binary segmentation method, the dataset was separated into two parts. The first part is (1 : 40) and the second part (41 : 59). The second change point is successfully identified at which reflects the mortality rate at 2020-03-30. Next we conduct the same procedure, and a change point is located at corresponding to 2020-02-29. However, no further change points were located. Next, we analyze the second segment (41 : 59) below. Thus, the change point occurs at 2020-04-18 located at . Next, we conduct the same procedure and no further change points were located.

Figure 6 represents the change points’ location for Italy COVID-19 mortality rate. The possible reason for change point of Italy COVID-19 mortality data is displayed in Table 9.

The change points segmented the span into three segments. The first segment was between 29-02-2020 to 30-03-2020 characterized by high mortality rates. The second segment between 30-03-2020 to 06-04-2020 is characterized by a decline of mortality rates, and finally, the third segment between 06-04-2020 to 18-04-2020 is characterized by a further decline in mortality rates.

4.2. Change Point Analysis for UK Data

This section describes the change point analysis of COVID-19 mortality rates data from the United Kingdom for a period of 76 days, from March 12 to July 15, 2020. https://COVID-19.who.int/ was the source of the data. The data are presented in Table 10.

A time-series plot of the dataset is displayed in Figure 7. The mortality rate is calculated as

All the parameters are considered changeable.

To identify the change point in the dataset of the UK mortality rates, we apply the test statistics defined in Equation (10). The results are displayed in Table 11.

A visual display of the change point locations in the dataset is displayed in Figure 8.

The possible causes of the change point in the UK COVID-19 mortality rate data are given in Table 12.

4.3. Change Point Analysis for Mexico Data

This section explains the change point analysis of COVID-19 death rates for Mexico for a period of 108 days that is from 4 March to 19 June 2020. https://COVID-19.who.int/ was the data source. The data are given in Table 13.

A time-series plot of the dataset is displayed in Figure 9. The mortality rate is calculated as

To detect the change point in the dataset of the Mexico Mortality rates, we apply the test statistics defined in Equation (10). The results are displayed in Table 14.

A visual display of the change point locations in the dataset is displayed in Figure 10.

The possible causes of the change point in the Mexico COVID-19 mortality rate data are given in Table 15.

5. Conclusions

Although the EGGAPE distribution is a more flexible distribution that may describe data with monotonic and nonmonotonic hazard shapes, few or no studies of the change point problem for such a distribution have been done. For this study, we present a change point detection method for a four parameter EGGAPE distribution based on the information approach specifically modified information criterion (MIC). All the parameters are considered changeable. The benefit of using MIC-based test is in order to avoid the complications of deriving the complicated asymptotic distributions of test statistic of likelihood ratio test and cumulative sum tests. In addition, we have applied the binary segmentation to detect multiple change points and their locations. In the simulation study for the power of the test, two scenarios were considered: a simulation study when there was a change point and a simulation study when there was no change point. When there was a change point, the power of the test was high, and when there was no change point, the power of the test was so small. The testing procedure is applied to three real datasets related to COVID-19 mortality rates in Italy, the United Kingdom, and Mexico. Multiple change points were successfully identified and their location identified. In this study, we have only considered a case where all the parameters are changing, for future work, study can be done when at least one of the parameters is not changing.

Data Availability

The data used to support the findings of the study are available within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This study was funded by researchers, supporting project no. RSP2022R488, King Saud University, Riyadh, Saudi Arabia.