Modeling Cancer Remission Time Data by Means of the Max Erlang Binomial Distribution

Munteanu, Bogdan Gheorghe

doi:https://doi.org/10.1155/2021/9932729

Computational and Mathematical Methods in Medicine

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 9932729 | https://doi.org/10.1155/2021/9932729

Modeling Cancer Remission Time Data by Means of the Max Erlang Binomial Distribution

Bogdan Gheorghe Munteanu¹

Academic Editor: Rafik Karaman

Received08 Mar 2021

Accepted21 Aug 2021

Published25 Sept 2021

Abstract

In this paper, a statistical simulation algorithm for the power series distribution, called the Max Erlang Binomial distribution, is proposed, analyzed, and tested for bladder cancer remission time data. In order to present the simulation technique, the EM algorithm for statistical estimation aimed at estimating the model parameters is described.

1. Introduction

The introduction of this new (generalized) distribution addresses reliability problems when lifetime can be expressed as the maximum or minimum of a sequence of independent and identically distributed (iid) random variables, which represents the system components’ risk times. In recent years, some researchers have proposed a series of new distributions for the maximum and minimum of a sequence of iid random variables. For example, Adamidis and Loukas [1], Kus [2], Tahmasbi and Rezaei [3], Louzada et al. [4], and Cancho et al. [5] were interested in determining the maximum or minimum distribution when the components in a sequence of iid random variables are exponentially distributed, and a number of components are of a discrete type. Next, Flores et al. [6] treated the distribution of a vector’s maximum with components that are exponentially distributed in a random number of a power series distribution type. This type of distribution is called the complementary exponential power series (CEPS) distribution. Also, Morais and Barreto-Souza [7] considered analyzing the Weibull distribution class by means of the power series distribution class (WPS). Recently, Louzada et al. [8] have developed a mathematical model that unifies the procedure for obtaining a distribution of the maximum and minimum of a sequence of iid random variables of the absolutely continuous type in a random number characterized by the generating function. But the problem of determining the general formula when the random variable forms a part of a power series distributions remains unsolved.

In this paper, the simulation algorithms for these family distributions are proposed. This study is intended as a completion of the research by Balkema and de Haan (1974), Bryson (1974), Ahsanullah (1991), Balakrishnan and Ahsanullah (1994), Childs and others (2001), Al Awadhi and Ghitany (2001, 2007), Zahrani and Harbi (2013), Al-Zahrani and Sagor (2014), Tahir and Cordeiro ([9], 2016), Hassan and Abd-Elfattah (2016), and Munteanu ([10], 2013). The above-mentioned algorithm was implemented by means of the Eclipse SDK programming environment.

This work has the following structure: Section 2 defines the mathematical properties of the Max Erlang Binomial power series distribution (the cumulative distribution function, the probability density function, the mean, and variance). The simulation techniques targeting the Max Erlang Binomial distribution are analyzed and formulated in Section 3, with results validation via the Pearson test. In Section 4, the simulation algorithm for the Max Erlang Binomial distribution parameters is proposed and tested using the method of the maximum likelihood estimation. Section 5 discusses an application of the proposed distribution using a real-life dataset. Lastly, in Section 6, some useful conclusions are drawn.

2. Development of the Mathematical Model

In [11], the properties of a new power distribution type series, called the Max Erlang Binomial (MaxErlB), are introduced and researched. As a mathematical model, this distribution describes the probabilistic behavior of lifetimes, widely used in researching the reliability of systems. In [11], this distribution is presented as the distribution of the maximum value in a random volume sample from a statistical population, Erlang distributed, where is a binomially distributed, zero-truncated random variable. Formally, things are presented as follows.

Let us consider the random variable such that .

Definition 1 ([12]). We say that the random variable has a power series distribution if where are nonnegative real numbers, is a positive number bounded by the convergence radius of power series (series function) , and is the power parameter of the distribution (Table 1).

PSD denotes the power series distribution function families. If the random variable has the distribution in Equation (1), then we write .

We consider that , , where are iid random variables with the distribution function , and the probability density function .

We note that .

The results in this section are obtained using the general framework in [13], for which reason some proofs are not presented.

Proposition 1 (see [11]). If the random variable , where are nonnegative iid random variables, and , , , with , , , , , the random variables and being independent; then, the cumulative distribution functions and the probability density function of the random variable are the following:

Definition 2 (see [11]). We say that the random variable has a Max Erlang Binomial power series distribution with parameters , , and (), if it has the cumulative distribution function (cdf) defined by Equation (2) and probability density function (pdf) defined by Equation (3).

The numerical characteristics (mean, variance) of a random variable with a MaxErlB distribution, in a particular case (), are presented in the following result:

Proposition 2. The mean and variance of the random variable , , , , are characterized by the following relations:

Proof. After Equation (3) and the definition of the mean, we obtain where , as developed by Newton’s binomial. A sum of -integrals then can be solved with elementary methods (method of integration by parts), which leads to Equation (4).
Similarly, evaluating the second-order moment together with the definition of variance, leads us to Equation (5).

Remark 1. We notice that for , we obtain the complementary exponential distribution introduced by Flores et al. [6].

3. Statistical Simulation for the MaxErlB Distribution

Taking advantage of the fact that the random variable , , , has the same distribution as the random variable , where are iid random variables, , , , and the value of the random variable , , , coincide with the value of the random variable zero-truncated binomial distributed with the same parameters, but provided this is a nonzero value, we can briefly describe the following algorithm.

3.1. Statistical Simulation Algorithm for the MaxErlB Distribution

Step 1. We generate a value of the random variable , ,

Step 2. If , then GO TO Step 1; otherwise,

Step 3. For the value of the random variable (generated in Steps 1 and 2), simulate the values as a values of -iid random variables with distribution , ,

Step 4. It is considered , STOP.

Following the simulation, we can apply the Chi-square test of concordance. Based on a test, based on the results , the Chi-square criterion (Pearson’s criterion) is applied, and the basic and alternative hypotheses are verified, respectively:

: sample values are values of the random variable distributed

: sample values are not the values of the random variable distributed .

The test is considered valid if the empirical value of is less than the upper critical value of the Chi-square with freedom degrees (). The statistics of Pearson’s test is calculated using the following relation: where represents the number of observed values in the interval , .

The probabilities that the random variable takes the values in the interval are calculated using the following relation: where represent the ends of each interval after they have been merged.

Based on the algorithm presented above, we can notice (see Table 2) that the mean and the empirical variance of the simulation results are well approximated by the mean and the theoretical variance of the random variable (Proposition 2), and the Chi-square criterion validates each time the basic hypothesis according to which the simulated values are indeed governed by this distribution.

Moreover, the validation is confirmed for samples values .

The histogram of the simulated data and the plot of the probability density function of the simulated distribution (Figure 1) also confirm the validity of the basic hypothesis, but visually.

4. EM Algorithm for the MaxErlB Distribution

The EM algorithm introduced in 1977 in the paper [14] comes to perfect the maximum likelihood method which, in the case of processing incomplete statistical data, becomes practically unusable. Next, the algorithm is implemented for the distribution.

We consider the values of a sample of size a statistical population govorned by a MaxErlB distribution with the probability density function , , which depends on the parameter vector , given that the parameter of the zero-truncated binomial distribution and the parameter of the Erlang distribution are given. According to the definition of the maximum likelihood function and Equation (3), we have

To obtain the maximum likelihood equations for the MaxErlB distribution regarding the estimation for the parameters , we consider

The parameters and being considered known, then the equations of the method for the maximum likelihood estimation function (MLE) are characterized by the nonlinear system , where . Developing the system of equations , we notice that it becomes difficult to solve in relation to the unknowns and . We are thus in the situation in which the application of the EM algorithm explained and analyzed by Dempster et al. [14], then expanded by McLachlan and Krishnan [15] is required. In this algorithm, the random variable is considered a random variable latency, that is, the random variable which cannot be observed directly.

For this, we consider, formally, the following sample: by observations of the random variable .

This shows that can be interpreted as a complete set of statistics, being, in this case, a sample of incomplete data. The description of the EM algorithm supposes a known conditional mean , where .

The probability density function , , of the random variable wich corresponds to a complete set of data, is defined by the following relation according to the definition of probability density in the case of the maximum (see [13], Consequence 2.2):

In these conditions, the probability density function of the random variable which corresponds to a complete set of data is given by

where , , , , , , and are the probability density function, respectively, the cumulative distribution function which has the distribution.

Then, the probability density function of the random variable conditioned by the random variable has the following expression:

Therefore, considering the obvious relation , the conditional mean becomes

Since , with , , , , we have

We describe the EM algorithm for the distribution as an iterative process of estimating the unknown parameter through calculated for a few steps such that the following condition is satisfied: or be accomplished when and represents the number of preset iterations.

The steps of the EM algorithm for MaxErlB distribution are the following:

Step 1. We take , , ,

Step 2. (Expectation). To iterate , , we calculate the mean value of , according to Equation (17) for :

Step 3. (Maximization). Through the maximum likelihood estimation (MLE) method, we take into consideration the following sample: with the maximum likelihood function: Thus, we can find iteration which estimates the parameters

Step 4. We examine Equation (18). If NOT, then GO TO Step 2; otherwise, , STOP.

Given the function the maximum likelihood equations are characterized by the nonlinear system , namely

Table 3 shows the results obtained from the implementation of the EM algorithm (described above), in the Octave 1.5.4 GUI programming environment. We must also emphasize that for different sample sizes (), we obtain very good approximations of the parameters and that characterize the MaxErlB distribution, when the parameters and are known.

5. Application

We will now consider a dataset which represents the remission times (in months) of a random sample of 128 bladder cancer patients. The dataset itself has previously been used in [16–18]. It is summarized as follows: 0.08, 2.09, 3.48, 4.87, 6.94, 8.66, 13.11, 23.63, 0.20, 2.23, 3.52, 4.98, 6.97, 9.02, 13.29, 0.40, 2.26, 3.57, 5.06, 7.09, 9.22, 13.80, 25.74, 0.50, 2.46, 3.64, 5.09, 7.26, 9.47, 14.24, 25.82, 0.51, 2.54, 3.70, 5.17, 7.28, 9.74, 14.76, 26.31, 0.81, 2.62, 3.82, 5.32, 7.32, 10.06, 14.77, 32.15, 2.64, 3.88, 5.32, 7.39, 10.34, 14.83, 34.26, 0.90, 2.69, 4.18, 5.34, 7.59, 10.66, 15.96, 36.66, 1.05, 2.69, 4.23, 5.41, 7.62, 10.75, 16.62, 43.01, 1.19, 2.75, 4.26, 5.41, 7.63, 17.12, 46.12, 1.26, 2.83, 4.33, 5.49, 7.66, 11.25, 17.14, 79.05, 1.35, 2.87, 5.62, 7.87, 11.64, 17.36, 1.40, 3.02, 4.34, 5.71, 7.93, 11.79, 18.10, 1.46, 4.40, 5.85, 8.26, 11.98, 19.13, 1.76, 3.25, 4.50, 6.25, 8.37, 12.02, 2.02, 3.31, 4.51, 6.54, 8.53, 12.03, 20.28, 2.02, 3.36, 6.76, 12.07, 21.73, 2.07, 3.36, 6.93, 8.65, 12.63, and 22.69.

Figure 2 provides the histogram of relative frequencies of a sample size which characterizes the remission times of bladder cancer, where the curve represents the pdf of the random variable distribution defined by Equation (3).

6. Conclusion

The conclusions revealed by the present research are related to the study of power series distributions type of a maximum of a sequence of iid random variables which are found in a random number.

Also, the distribution of a maximum number of iid random variables through the PSD family, characterized by the number of the random variable in the sequence, was presented in a compact, coherent approach.

For this purpose, programs for the statistical simulation of the MaxErlB power series distributions type were developed. The validity of the maximum distributions was performed using Pearson’s test of consistency and is reflected in Table 2. Describing the EM algorithm implemented in the GUI Octave 1.5.4 programming environment to estimate the parameters of the MaxErlB distribution is presented in Table 3.

A real data sequence on bladder cancer remission times was used to illustrate and compare the histogram of the relative frequencies of remission times and the probability density function plot of the remission time values that are governed by the MaxErlB distribution (Figure 2).

Data Availability

All data are fully available without restriction.

Conflicts of Interest

The author declares no conflicts of interest.

References

K. Adamidis and S. Loukas, “A lifetime distribution with decreasing failure rate,” Statistics and Probability Letters, vol. 39, no. 1, pp. 35–42, 1998.
View at: Publisher Site | Google Scholar
C. Kus, “A new lifetime distribution,” Computational Statistics and Data Analysis, vol. 51, no. 9, pp. 4497–4509, 2007.
View at: Publisher Site | Google Scholar
R. Tahmasbi and S. Rezaei, “A two-parameter lifetime distribution with decreasing failure rate,” Computational Statistics and Data Analysis, vol. 52, no. 8, pp. 3889–3901, 2008.
View at: Publisher Site | Google Scholar
F. Louzada, M. Roman, and V. G. Cancho, “The complementary exponential geometric distribution: model, properties, and a comparison with its counterpart,” Computational Statistics and Data Analysis, vol. 55, no. 8, pp. 2516–2524, 2011.
View at: Publisher Site | Google Scholar
V. G. Cancho, F. Louzada-Neto, and G. Barriga, “The Poisson-exponential lifetime distribution,” Computational Statistics and Data Analysis, vol. 55, no. 1, pp. 677–686, 2011.
View at: Publisher Site | Google Scholar
D. J. Flores, P. Borges, V. G. Cancho, and F. Louzada, “The complementary exponential power series distribution,” Brazilian Journal of Probability and Statistics, vol. 27, no. 4, pp. 565–584, 2013.
View at: Publisher Site | Google Scholar
A. L. Morais and W. A. Barreto-Souza, “A compound class of Weibull and power series distributions,” Computational Statistics and Data Analysis, vol. 55, no. 3, pp. 1410–1425, 2011.
View at: Publisher Site | Google Scholar
F. Louzada, M. P. E. Bereta, and M. A. P. Franco, “On the distribution of the minimum or maximum of a random number of i. i. d. lifetime random variables,” Applied Mathematics, vol. 3, no. 4, pp. 350–353, 2012.
View at: Publisher Site | Google Scholar
M. H. Tahir and G. M. Cordeiro, “Compounding of distributions: a survey and new generalized classes,” Journal of Statistical Distributions and Applications, vol. 3, pp. 2–35, 2016.
View at: Publisher Site | Google Scholar
B. G. Munteanu, “The Min-Pareto power series distributions of lifetime,” Applied Mathematics & Information Sciences, vol. 10, no. 5, pp. 1673–1679, 2016.
View at: Publisher Site | Google Scholar
A. Leahu, B. G. Munteanu, and S. Cataranciuc, “Max-Erlang and Min-Erlang power series distributions as two new families of lifetime distribution,” Buletinul Academiei de Stiinte a Republicii Moldova. Matematica, vol. 2, no. 75, pp. 60–73, 2014.
View at: Google Scholar
N. L. Johnson, A. W. Kemp, and S. Kotz, Univariate Discrete Distribution, Wiley, Hoboken, NJ, USA, 2005.
View at: Publisher Site
A. Leahu, B. G. Munteanu, and S. Cataranciuc, “On the lifetime as the maximum or minimum of the sample with power series distributed size,” Romai Journal, vol. 9, no. 2, pp. 119–128, 2013.
View at: Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society Series B, vol. 39, pp. 1–38, 1997.
View at: Publisher Site | Google Scholar
G. J. McLachlan and T. Krishnan, The EM Algorithm and Extension, Wiley, New York, NY, USA, 1997.
T. G. Ieren and A. U. Chukwu, “Bayesian estimation of a shape parameter of the Weibull-Frechet distribution,” Asian Journal of Probability and Statistics, vol. 2, no. 1, pp. 1–19, 2018.
View at: Publisher Site | Google Scholar
E. L. Lee and J. W. Wang, Statistical Methods for Survival Data Analysis, Wiley, Hoboken, NJ, USA, 3rd edition, 2003.
View at: Publisher Site
E. A. Rady, W. A. Hassanein, and T. A. Elhaddad, “The power Lomax distribution with an application to bladder cancer data,” Springer Plus, vol. 5, no. 1, p. 1838, 2016.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Bogdan Gheorghe Munteanu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

333

Downloads

710

Citations