Abstract

This paper aims to introduce a superior discrete statistical model for the coronavirus disease 2019 (COVID-19) mortality numbers in Saudi Arabia and Latvia. We introduced an optimal and superior statistical model to provide optimal modeling for the death numbers due to the COVID-19 infections. This new statistical model possesses three parameters. This model is formulated by combining both the exponential distribution and extended odd Weibull family to formulate the discrete extended odd Weibull exponential (DEOWE) distribution. We introduced some of statistical properties for the new distribution, such as linear representation and quantile function. The maximum likelihood estimation (MLE) method is applied to estimate the unknown parameters of the DEOWE distribution. Also, we have used three datasets as an application on the COVID-19 mortality data in Saudi Arabia and Latvia. These three real data examples were used for introducing the importance of our distribution for fitting and modeling this kind of discrete data. Also, we provide a graphical plot for the data to ensure our results.

1. Introduction

Modeling pandemics is significant in our life as it makes it easier for researchers to understand the behavior of the spread of each virus and its effect on humanity. Nowadays, a new virus has risen on the top of the scene, Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), which causes COVID-19. This virus attracts the interest of many researchers who tried many attempts to model daily deaths in the entire world by the effect of COVID-19 infection. As an example of these studies, Al-Babtain et al. [1] introduced a natural discrete Lindley distribution and studied the mortality numbers in Egypt from 8 March to 30 April 2020. Also, Hasab et al. [2] make a study on the COVID-19 mortality numbers by using the susceptible infected recovered (SIR) epidemic dynamics of COVID-19 pandemic to model COVID-19 infections in Egypt. Algarni et al. [3] discussed type-I half-logistic Burr XG family with application of COVID-19 data. Almetwally [4] discussed the odd Weibull inverse Topp–Leone distribution with applications to COVID-19 data. Almetwally et al. [5] discussed new distribution with applications to the COVID-19 mortality rate in two different countries. El-Morshedy et al. [6] studied a new discrete distribution, called discrete generalized Lindley, to analyze the counts of the daily COVID-19 cases in `Hong Kong and daily new deaths in Iran. Maleki et al. [7] used an autoregressive time-series model regarding the two-scale mixture normal distribution to predict the retrieved and reported COVID-19 occurrences. Nesteruk [8] forecasts the daily new COVID-19 occurrences in China by using the mathematical model SIR. Batista [9] used a logistic growth regression model to estimate the final size and its peak time of the COVID-19 epidemic. Muse et al. [10] discussed modeling the COVID-19 mortality rate with a new versatile modification of the log-logistic distribution. Liu et al. [11] presented a new statistical model called arcsine-modified Weibull distribution for modeling COVID-19 patients’ data.

Afify and Mohamed [12] developed the extended odd Weibull exponential (EOWE) distribution for data modeling in many sciences such as architecture, medicine, and reliability. The EOWE distribution is a flexible model offering different density function forms such as left-skewed, symmetrical, right-skewed, and reversed-J; see, the work of Alshenawy et al. [13]. Its hazard rate function (HRF) may provide declining, constant, rising, upside-down bathtub and J-shaped hazard rates, and bathtub and modified bathtub hazard ratings are quite important in terms of durability technologies. For more details, see the work of Alshenawy et al. [13]. Generally speaking, most distributions are used to model such data and can usually take four or five parameters to achieve these hazard rates. DEOWE distribution has three parameters only, and it can be used to analyze censored data due to its easy, closed forms of its HRF and cumulative distribution function (CDF).

The CDF and probability mass function (PMF) of the DEOWE distribution are given, respectively, byand

Why do we need discrete distributions is a question that any researcher would ask. The reason is that most current continuous distributions do not provide reliable findings for modeling the COVID-19 scenarios. The reason for all of this, as we all know, is that death counts or regular new cases display extreme dispersion.

Many authors have introduced discrete distributions to overcome the deficiencies of the continuous distribution in modeling mortality numbers, such as Para and Jan [14] have introduced discrete Burr-type XII and discrete Lomax distributions. Discrete Lomax (DL) distribution is the discrete distribution which exhibits heavy tails and can be helpful in medical science and other fields, discrete Burr (DB), which is presented by Krishna and Pundir [15], discrete Lindley (DL), which is introduced by Gómez-Déniz and Calderín-Ojeda [16], discrete generalized exponential (DGEx), which is presented by Nekoukhou et al. [17], natural discrete Lindley (NDL), which is introduced by Al-Babtain et al. [1], and discrete Gompertz Exponential (DGzEx), which is presented by El-Morshedy et al. [6]. Gillariose et al. [18] introduced discrete Weibull Marshall–Olkin family of distributions with properties, characterizations, and applications. Discrete Marshall–Olkin generalized exponential distribution has been presented by Almetwally et al. [19]. Al-Babtain et al. [20] discussed the estimation of the parameters of two discrete models called discrete Poisson–Lindley and discrete Lindley distributions, with some applications.

To convert a continuous distribution to a discrete one, a variety of methods are possible. A survival discretization approach is the most widely used technique for generating discrete distributions. It necessitates the existence of CDF, the existence of a continuous and nonnegative survival function, and the division of period through unit intervals. In Roy [21], the probability mass function (PMF) of a discrete distribution is described aswhere , where is a CDF of continuous distribution and is a vector of parameters. The random variable X is said to have the discrete distribution if its CDF is given by . The hazard rate is given by . The reversed failure rate of discrete distribution is given as .

The novelty and the motivation to write this paper is to find the best statistical model which can provide the fit for COVID-19 mortality numbers in Saudi Arabia and Latvia by introducing a new discrete model, namely, the DEOWE distribution. The point estimation of the unknown parameters has been discussed by using the MLE method. Also, we make an expectation for the mortality number in each day.

The remainder of this article is organized as follows. In Section 2, we define DEOWE distribution. DEOWE linear representation of its PMF is obtained in Section 3, along with some of its statistical properties. The MLE method is used for parameter estimation in Section 4. In Section 5, we performed a simulation study to study the performance of the distribution relative to the true values of the parameters; also, we evaluated the relative bias (Rbias) and mean square error (MSE) of the estimation method. Two real datasets were used as three real data applications on the mortality numbers in Section 6. These three applications were used to prove that the proposed distribution provides the efficiency of the DEOWE distribution with respect to other distributions by evaluating the information criteria and the values and chi-square values for all distributions. Finally, conclusions and the major findings are given in Section 7.

2. DEOWE Distribution

In this section, we introduce the DEOWE distribution, the PMF, and the CDF which are obtained. Some figures with different values of the parameters for the PMF and HRF of the distribution are represented in Figures 1 and 2.

The DEOWE distribution is obtained based on the survival discretization method. Let denote the survival function (S) of a baseline model with parameter vector , respectively, so the CDF of the DEOWE distribution is given by

The corresponding PMF of (4) is defined bywhere are positive parameters. The random variable with PMF (5) is denoted by DEOWE ; the corresponding HRF of the DEOWE distribution is defined by

3. Mathematical Properties

This section of the paper introduces the linear representation of the DEOWE distribution with its quantile function.

3.1. Linear Representation

In this section, we made a linear representation for the PMF of the proposed distribution. We used linear representation to derive different statistical properties of the proposed model. Unfortunately, we reach a result form which does not follow any statistical model, and it is mathematically difficult to use to derive different statistical properties. In the case of the proposed distribution, we have three different cases for this linear representation.

For , we have the following expansion:

For , we have the following expansion:

Case 1. If and , then we haveand if and , then we haveFrom equations (7) and (8), we have a linear representation of PMF (5) as the following:where .

Case 2. . If , then we haveIf , then we haveFrom equations (9) and (10), we have a linear representation of PMF (5) as the following:where .

Case 3. If and , then, from equations (7) and (10), we have a linear representation of PMF (5) as the following:

3.2. Quantile Function

The quantile function (QF) of the DEOWE distribution is the inverse function of the CDF, and it is given as follows:

The three quarterlies (Q) of the DEOWE distribution can be obtained by setting , and 0.75 in equation (11).

Bowley’s skewness (BS) and Moor’s kurtosis (MK) can be calculated by the QF, respectively, as follows:and

Table 1 shows the numerical mean, variance, BS, and MK for the distribution using different parameters. These different values are coherent with the plots in Figure 1

4. Parameter Estimation

In this section, we use the MLE method to estimate the unknown parameters of the DEOWE distribution. Assume that represents a random count discrete sample that follows the DEOWE distribution having the parameters, , and . So, the log-likelihood function will have the following form:where is a vector of the DEOWE parameters. The MLEs are obtained by solving the following normal equations:and

These equations cannot be solved explicitly. Hence, a nonlinear optimization algorithm as the Newton–Raphson method is used.

5. Simulation Studies

This part of the paper is devoted to make the Monte Carlo simulation procedure. This simulation study is performed for the classical estimation method: MLE for estimating parameters of DEOWE distribution in a lifetime by R language. Monte Carlo experiments are carried out based on data generated from 10 000 random samples from DEOWE distribution, where has DEOWE lifetime for different actual values of parameters and different sample sizes n: (20, 40, 70, and 100).

We evaluate in every table Rbias and MSE of estimators.Tables 24 summarize the simulation results of the point estimation method in this paper. We consider the Rbias and the MSE values to perform the needed comparison between different parameters’ values and their effect on point estimation values.

In every table, we fix the value and increase the values of both and , and then, we study the effect of increasing and decreasing the values. Concluding remarks are provided at the end of this section to illustrate the impact of the increment and decrements of the parameter’s values.

5.1. Concluding Remarks on Simulation Results

In this section of the paper, we introduce the major findings deduced from the simulation tables; we introduced the effect of increasing the sample sizes and the effect of increasing the true values of the parameters used in the simulation study. Also, we will discuss the effect of fixing the value of every two parameters and increasing the value of the third one. The following points can be noted from Tables 24:(1)As we can see from the results from Tables 24, by increasing the sample size, we can see that the consistent property of MLEs comes true, and the Rbias value and MSE values of the three parameters decrease(2)Referring to Table 2, by making the value of and for a fixed value of and increasing from 0.01 to 0.15, we deduce that the MSE and Rbias of the parameters increase in most cases(3)Referring to Table 3 by fixing the value of and for a fixed value of and increasing from 0.05 to 0.5, we deduce that the MSE and Rbias of the parameters increase in most cases(4)By increasing the value of from 1.5 to be five as in Table 4 and making the sample size fixed for both values of beta, we deduced that the MSE and Rbias of the parameters increase in most cases

6. Applications to COVID-19 Data

In this section of the paper, we introduce two real data applications on the COVID-19 mortality numbers in Saudi Arabia, and the third data are outside Saudi Arabia; this third data were for Latvia mortality rate. The first data were an expressed sample on the first wave, while the second sample was an expressed sample on the second wave. The first application depends on the period from 26 December to 17 February 2021 for the infections in Saudi Arabia. We used this period because recording the infection numbers in this period was accurate as it was the peak of the second wave in Saudi Arabia. As in the earlier months of infection, recording the number of deaths was not accurate, so we choose this period specifically. The second dataset was taken for a period from 30 May 2020 to 20 August 2020. We choose this period because this period was the starting of the outbreak of COVID-19 in Saudi Arabia, and the mortality numbers start to increase also. This period is considered as the peak of the first wave in Saudi Arabia, which is very important to be modeled. We also evaluated the information criteria to introduce the importance of the proposed distribution compared with other competitors.

6.1. Application 1

In this section, we introduce a very important real data application for the DEOWE distribution, which the number of deaths due to COVID-19 infection in Saudi Arabia of 54 days of infection. Table 5 contains some information and descriptive statistics for this data, which are recorded from 26 December to 17 February 2021. The data used in this application are as below: 9, 8, 9, 11, 8, 10, 9, 7, 9, 7, 10, 9, 7, 6, 4, 4, 5, 4, 5, 4, 6, 3, 5, 5, 6, 6, 3, 4, 4, 4, 2, 3, 4, 4, 3, 2, 4, 3, 4, 4, 3, 3, 4, 4, 5, 4, 4, 5, 5, 4, 4, 4, 6, 3. This data were collected from the world health organization, and these numbers represent the number of deaths per day. For more information, see the following link: https://covid19.who.int/. This data were used as a real data example for the proposed distribution and its competitor distributions. We compare the fitting results of the binomial (bionm), negative binomial(Nbionm), Poisson (Pois) distributions, see the work of Johnson et al. [22], discrete generalized exponential (DGE) distribution, see the work of Nekoukhou et al. [17], the discrete alpha power inverse Lomax (DAPIL) distribution is introduced by Almetwally and Ibrahim [23], the discrete Marshall–Olkin Generalized exponential (DMOGE) distribution is introduced by Almetwally et al. [19], and Skellam [24] introduced the Skellam distribution, and the results of these fitting are tabulated in Table 6.

To make the comparison between many distributions, we must make this comparison based on some criteria; one of these analytical measures is called the Akaike information criterion (AIC), see [25]; there are another criteria called Bayesian information criterion (BIC), see [26], for more information, and we can also refer to Hannan–Quinn for more information criterion (HQIC), see [27], for more information, and last criteria are called the consistent Akaike information criterion (CAIC), see [28], for more details; all these criteria were used to compare the goodness of fit of the proposed model with other competing distributions. These measures are as follows.

The AIC is given by

The CAIC is

The BIC is calculated as follows:

The HQIC iswhere k is the number of model parameters, n is the sample size, and refers to the log-likelihood function evaluated at the MLEs. Table 6 provides values of AIC, BIC, CAIC, HQIC and, chi square () with a degree of freedom, and its value for all models is fitted based on the real dataset of Saudi Arabia. Figure 3 indicates a comparison between these distributions to get the best distribution; also, Figures 35 indicate the graphical plots of the data and the PMF of DEOWE distributions, with the corresponding competitive distributions with various numbers of parameters. As we can see that the plot in Figure 6 is the CDF of the distributions with the random variable X, while the third graph in Figure 7 is for the quantile function as a function of x, where x is the number of deaths per day; Figure 8 shows graphical plots of the data and the PMF of the DEOWE distributions.

6.2. Application 2

In this section, the DEOWE distribution is fitted to another set of data of COVID-19 mortality numbers in Saudi Arabia of 83 days of infection, which is recorded from 30 May 2020 to 20 August 2020. Table 7 contains some information and descriptive statistics for this data, while Table 8 contains the dataset used in this application associated with the frequency of each death number and the probability of this number; the data are as follows: 17, 22, 23, 22, 24, 30, 32, 31, 34, 36, 34, 37, 36, 38, 36, 39, 40, 39, 41, 39, 48, 45, 46, 37, 40, 39, 41, 41, 46, 37, 40, 48, 50, 49, 54, 50, 56, 58, 52, 49, 42, 41, 51, 30, 42, 20, 40, 42, 45, 37, 40, 39, 37, 34, 44, 34, 37, 31, 30, 27, 29, 27, 26, 24, 21, 30, 32, 35, 36, 35, 38, 37, 37, 32, 34, 36, 34, 35, 31, 39, 28, 34, 36. These data were collected from the World Health Organization and these numbers represents the number of deaths per day, for more information see the following link: https://covid19.who.int/. We compare the fitting results of the discrete generalized exponential (DGE) distribution, see the work of Nekoukhou et al. [17], the discrete Marshall–Olkin generalized exponential (DMOGE) distribution is introduced by Almetwally et al. [19], and exponentiated discrete Weibull (EDW) distribution is introduced by Nekoukhou et al. [29].

6.3. Application 3

In this section, the DEOWE distribution is fitted to another set of data of COVID-19 mortality numbers in Latvia of 33 days of infection, which is recorded from 12 May 2021 to 13 April 2021. We choose this period specifically because it was the peak of the second wave of the COVID-19 infection in Latvia. Table 9 contains some information and descriptive statistics for this data, while Table 10 contains the dataset used in this application associated with the frequency of each death number and the probability of this number, and Table 11 contains the MLE of the parameters and the values and chi-square values for the distributions, also the information criteria for each distribution. The data are as follows: 11, 9, 11, 10, 2, 8, 12, 12, 10, 10, 5, 2, 12, 11, 13, 3, 5, 6, 5, 10, 6, 14, 9, 1, 8, 3, 3, 9, 17, 18, 5, 0, 4. These data were collected from the world health organization, and these numbers represent the number of deaths per day. For more information, see the following link: https://covid19.who.int/. We compare the fitting results of the discrete generalized exponential (DGE) distribution, see the work of Nekoukhou et al. [17], the discrete Marshall–Olkin generalized exponential (DMOGE) distribution is introduced by Almetwally et al. [19], and exponentiated discrete Weibull (EDW) distribution is introduced by Nekoukhou et al. [29].

6.4. Concluding Remarks on the Real Data

(1)By referring to the goodness-of-fit measurements’ values in Tables 6 and 12, we deduce that the DEOWE distribution has the lowest chi square, AIC, and CAIC values among all distributions for the three applications.(2)By referring to the values of the goodness of fit measurements in Tables 6 and 12, we deduce that the DEOWE distribution has the highest value among all of its competitors for the three applications.(3)For application one and by referring to Figures 3 and 4, we can see that the one- and two-parameter distributions provide poor fitting for the data. In contrast, the three-parameter DEOWE distribution in Figure 5 provides better fitting for the data among all its competitors.(4)For application two and by referring to Figure 9, we can see that the three-parameter DEOWE distribution in Figure 9 provides better fitting for the data among all its competitors.(5)For application two, we can see that the plot in Figure 10 is the CDF of the distributions with the random variable X, while the graph in Figure 11 is for the quantile function as a function of x, where x is the number of deaths per day. Figure 12 shows graphical plots of the data and the PMF of the DEOWE distributions(6)For application three and by referring to Figure 13, we can see that the three-parameter DEOWE distribution in Figure 13 provides better fitting for the data among all its competitors for more information about the PMF of the other distributions, see the Appendix.(7)For application three, we can see that the plot in Figure 14 is the CDF of the distributions with the random variable X, while the graph in Figure 15 is for the quantile function as a function of x, where x is the number of deaths per day. Figure 16 shows graphical plots of the data and the PMF of the DEOWE distributions

7. Conclusion

In this paper, we introduced a new distribution, which is called DEOWE distribution the aim to do this work was the lack of flexibility in other distributions. We studied its statistical properties and obtained a linear representation for its PMF and the associated quantile function. We used the MLE method for estimating the distribution parameters , and . Also, a real dataset of the mortality numbers in the Kingdom of Saudi Arabia (KSA) was considered to assess the performance of the DEOWE. The distribution fitting for the real dataset was compared with its competitors, and by referring to the values of the goodness of fit measurements, we deduce that the DEOWE distribution has the lowest chi square, AIC, and CAIC for the first dataset, and for the second dataset, we deduce that the DEOWE distribution has the lowest chi square, AIC, CAIC, BIC, and HQIC and the highest value among all of its competitors. This result indicates that the DEOWE distribution provides a superior model for fitting the mortality number compared with other competitive distributions. Also, we make a graphical plot for the data using the DEOWE with other competitive distributions, and the plots come in our favor and assure the results of the goodness-of-fit measurements.

Appendix

The PMF of the compared models is given as the following.(i)Binomial distribution: , x = 0, 1, 2, 3, …, n.(ii)Poisson distribution: .(iii)Negative binomial distribution: (iv)Skellam distribution: (v)Discrete alpha power inverse Lomax: (vi)Discrete generalized exponential distribution: (vii)Discrete Marshall–Olkin generalized exponential distribution: (viii)Exponentiated discrete Weibull:

For more information about the code used in the paper, see Function ”maxLik” of ”maxLik” package in the R program which has been used by Pho, K. H., andNguyen, V. T. (2018) in Comparison of Newton–Raphson algorithm and Maxlik function, Journal of Advanced Engineering and Computation, 2(4), 281–292, and Henningsen, A., and Toomet, O. (2011), maxLik: A package for maximum likelihood estimation in R. Computational Statistics, 26(3), 443–458.

Data Availability

All data links and references are provided within the article.

Conflicts of Interest

The authors declare no conflicts of interest regarding this paper.

Authors’ Contributions

All authors supervised, validated, visualized, wrote the original draft, and reviewed and edited the manuscript. Editing and replying to reviewers have been done by Dr Taghreed M. Jawa and Neveen Sayed-Ahmed. Proof editing and fixing typos and revising the language have been done by both Dr Taghreed M. Jawa and Dr Neveen Sayed-Ahmed.

Acknowledgments

This work was supported by Taif University researchers, supporting project no. TURSP-2020/318, Taif University, Taif, Saudi Arabia.