Computational Mathematics and Numerical AnalysisView this Special Issue
Statistical Analysis of Water Purification (Using Vinyl Chloride) Data
Water is important to live because of its impacts on food supply and the natural environment for all living things. Approximately 0.3 percent of water resources are usable in the world. Groundwater is the principal source of drinking water but the air, sea, land, rivers, lakes, ocean, and wells are also the essential sources of water. Many statistical models have been used to analyze the water data set so that to make predictions in a better way with limited resources of water. In this paper, a new approach is offered to model the data of water purification using vinyl chloride data. Various statistical properties of the proposed model have been derived. Maximum likelihood estimation is used to estimate the model parameters. Monte Carlo simulations are used to show the consistency of parameters. Using water purification using vinyl chloride, the suggested model application is studied and compared with that of other existing models such as EGF, GF, and F. The results showed that our model provides a much better fit while modeling this data set rather than EGF, GF, and F distributions.
In century, one of the main issues of water resources is the preservation and rational utilization. Water is a valuable resource in terms of its quality, quantity, and location, with quality referring to the state of the water system as measured by chemical, biological indicators, and physical-chemical. Economic and geographic influence and influence on ecosystem and human health are all key factors to consider when assessing the top-priority issues of water quality . Changes in the physico-chemical features of water quality are determined by anthropogenic factors and also by natural processes such as lithology and topography, hydrological conditions, precipitation inputs, climate, edaphic and tectonic factors, and catchment area erosion, in combination with environmental influences . The greatest threat to water quality is posed by point sources of industries and municipalities. In terms of water quality, anthropogenic influences can have negative consequences in a short time period . The organic solid load and the dynamics of its degradation are very good indicators of anthropogenic impact on water .
Water pollution threatens the environment. A water resource policy is needed to resolve this problem. The causes of the death and diseases ratio are increased and about 14000 people dead cases are recorded every day due to water pollution. Many factors are considered that cause the water pollution nowadays such as industrial wastes, atmospheric pollutants, pesticides, and herbicides .
Water is purified by biological and physico-chemical processes that occur at the air-water interface and within the bulk water, over the surface of the sand and within the bed of sand . Because of their wide surface area, excellent mechanical strength, high chemical reactivity, and low cost, nanoscale composite materials have enormous potential to filter water in a variety of ways. Metal nanocomposite, carbon nanocomposite, metal oxide nanocomposite, and polymer nanocomposite plays an active role in water purification.
In the literature, there exist a lot of distributions that can be used to model the real data sets but these distributions do not increase the goodness of fit and also do not capture the non-monotonicity of the data. For example, the Weibull, Gamma, and Exponential distributions fail to model the data with a nonmonotonic hazard rate. To overcome the limitations of these distributions, researchers are working to define new models. The detail is given under the material and methodology section.
The aim of this paper is to define a new probability model and test distributions with application to the data set of water purification using vinyl chloride. The following points are of interest:(1)To propose a model that can model the data sets with both monotonic and nonmonotonic hazard rate functions(2)To increase the goodness of fit as compared with other models
The article is arranged as follows. Materials and Methods are discussed in Section 2. Section 3 introduces the FFP lifetime distribution and the expressions for density function, distribution function, survival function, and hazard rate functions are presented. In this Section 3, two propositions are discussed. In Section 4, mathematical properties of the proposed distribution are discussed. Parameter Estimation is done in Section 5. A simulation study is conducted in Section 6. In Section 7, an application study is carried out to show the flexibility of the FFP model. We present concluding remarks in Section 8.
2. Material and Methodology
Statistical distributions have become propitious in describing and predicting real-world applications. The basic idea behind developing these distributions is that the lifetime of a system with Y number of elements (discrete random variable) and a positive continuous random variable , which denotes the lifetime of the element, can be represented by nonnegative random variable if the elements are in a parallel or if the elements are in a series. Recently, various distributions to model lifetime data have been introduced; for example, Tahmasebi and Jafari  proposed a new distribution named Generalized Gompertz-Power Series Distributions. Rafique and Saud  proposed lifetime distribution with modified upside-down bathtub, increasing, decreasing, and reverse J-shaped hazard rate by compounding exponentiated generalized Frechet with geometric distribution. Another compounded inverse Weibull distribution with monotone and nonmonotone failure rates was proposed by Chakrabarty and Chowdhury . Later Joshi and Dhungana , Louzada et al. , and Ibrahim  have come up with similar studies with Exponentiated Rayleigh poisson, Quasi Xgamma-Poisson, and Poisson Rayleigh Burr XII distributions.
The Exponentiated Generalized Frechet (EGF) distribution was introduced by Cordeiro et al. . Various fields of engineering applications need to model the statistical performance of material properties and also be used to determine different failure characteristics as useful life, infant mortality, and wear-out periods. It is the appropriate model of extreme value continuous distribution. Extreme value distribution is required for a statistical model to maximize or minimize the collection of random observations. Extreme value theory (EVT) haS been used to predict probabilities for maximum or minimum of the extreme values and measure events occurring with a small probability.
Let X be a continuous random variable following EGF distribution with , , , and and its probability density function (PDF) and cumulative distribution function (CDF) are, respectively, defined bywhere , are the shape parameters and is the scale parameter.
Let be a discrete random variable following zero a truncated Poisson distribution with parameter . The probability mass function of “Y” is given as
Now, assume a sequence of iid random variables extracted from X and also independent from Y. We define .
Then, we get
The joint probability density function of (T, Y) is
3. The FFP Distribution
In this section, we derive a novel probability model using (4) called Flexible Frechet Poisson (FFP) distribution as a marginal density of T with the following PDF and CDF:where
The survival function and hazard function of T are given by
Proposition 1. . The density function of the FFP model can be shown as infinite mixture of EGF density with parameters , , , and using PDF form (5).
The Taylor series formula for real is given by
The binomial expansion for a real, noninteger, is
Proposition 2. The EGF distribution with parameters , , , and is a limiting form of the FFP class of distributions when .which is the CDF of EGF distribution with parameter , , , and .
4. Statistical Properties
In this section, we obtained the expressions of the statistical measures such as moment, quantile function, Lorenz, and Bonferroni curve.
4.1. Quantile Function and Random Number Generator
Quantile function is used to obtain the value of the random variable in the given probability distribution at a specific probability level. The quantile of any probability distribution can be attained by solvingwhere ,
When substituting d = 0.25, 0.5, 0.75, we get the first second and third quantile of the FFP model, respectively.
The random number generator of the FFP distribution is given aswhere R represents random number that follows uniform distribution .
For illustration purpose, we considered different set of parameters value (hypothetical) subject to the conditions that the parameter value must be positive. The quantile function of FFP distribution using different sets of parameters is shown in Table 1.
The moments of any probability distribution have major importance in statistical analysis and also in the application of real data sets. The moment is obtained as follows:then
The mean and variance of the FFP distribution are
The descriptive measures for different parameter values for the FFP distribution are given in Table 2.
4.3. Order Statistics
Let be the order statistics of sample size n from FFP. Then, the PDF of the order statistic is given by
4.4. Mean Deviation
In statistics, we measure the strength of scatter points in population by mean deviation about the mean and median. Let be a random variable coming from FFP distribution with CDF , mean , and , then mean deviation about mean and mean deviation about median are given byrespectively, wherewhere is upper incomplete gamma function .
4.5. Lornez and Bonferroni Curve
Bonferroni and Lorenz curves were proposed by Bonferroni . It is used to measure the distribution of wealth and income. It is also appropriate for demography, insurance, medicine, and so on. The information in the Lorenz curve could be precise by the Gini Coefficient. The Lorenz and Bonferroni of FFP distribution are given bywhere .
5. Parameter Estimation
Let , ,……., be a random sample of size n from FFP distribution with parameters The log-likelihood function, say , can be written as
To obtain MLEs, we will take partial derivative with respect to , , , , and , respectively, and equating to zero. The associated components of the score function are as follows:
The maximum likelihood estimate (MLE) of , say , is obtained by solving the nonlinear system . These equations cannot be solved analytically, and statistical software can be used to solve them numerically via iterative methods. We can use iterative techniques such as a Newton–Raphson type algorithm to obtain the estimate .
6. Simulation Study
We performed a Monte Carlo simulation study to demonstrate the performance of the ML estimates for FFP distribution for different sample sizes. R statistical software is used to obtain the maximum likelihood estimates (MLE’s), their standard errors, bias, and root mean square error (RMSE). The experiment is repeated 1000 times with sample size n = 30, 50, 75, 100, 300, and 500. In each trial, the estimates are obtained through maximum likelihood method of estimation using the Newton–Raphson method for obtaining approximate solution. For this purpose, we considered 0.2, 0.5, 1.5, 0.2, and 0.12 for parameters as initial guess (assumed true values) subject to the condition that these values must be positive. The means of ML estimates along with standard error, bias, and root mean square error are computed.
From all results of Table 3, it can be noticed that(i)RMSE decreases as sample size increases(ii)Biases decreases as sample size increase(iii)Estimates of the unknown parameters are closer to true values as sample size increases
To demonstrate the superiority and applicability of the FFP distribution as compared with the other distributions, namely, Exponentiated Generalized Frechet (EGF), Generalized Frechet (GF), and Frechet (F) distribution, an application of water purification data is used. The ML estimates, standard errors, and goodness of fit measures, i.e., AIC, BIC, Kolmogorov–Smirnov (KS), Cramer Von Mises , and Anderson-Darling are computed for the four models.
7.1. Vinyl Chloride Data
The data consist of 34 observations and were analyzed by Bhaumik et al.  for testing the parameters of Gamma distribution. The data are about vinyl chloride (in mg/l) which was obtained from clean-up gradient monitoring wells. The observations are given in Table 4. The maximum likelihood estimates of the model parameters, test statistics, and corresponding -values are presented in Tables 5 and 6. The suggested model has the lowest values of the test statistics which shows that the FFP distribution is more suitable to model the water purification data. Figure 3 describes the theoretical and empirical PDF, CDF, and TTT plots of the data. Figure 3 represents the graphs of estimated PDF and CDF for this data. Figure 4 presents the TTT plot for these data showing a decreasing tendency of the hazard function. Thus, according to all these graphs and tests, the FFP model is a good fit for this data set.
The data consisted of 34 observations and were analyzed by Bhaumik et al.  for testing the parameters of Gamma distribution. These data are about vinyl chloride (in mg/l) which was obtained from clean upgradient monitoring wells. The observations are given in Table 4. The maximum likelihood estimates of the model parameters, test statistics, and corresponding -values are presented in Tables 5 and 6. The suggested model has the lowest values of the test statistics showing that the FFP distribution is more suitable to model the water purification data. Figure 3 describes the theoretical and empirical PDF, CDF, and TTT plot of the data. Figure 3 represents the graphs of estimated PDF and CDF for these data. Figure 4 presents the TTT plot for these data showing a decreasing tendency of hazard function. Thus, according to all these graphs and tests, the FFP model is good fit for this data set.
This paper introduced a novel probability model for modeling the water purification data using vinyl chloride. The proposed model is called Flexible Frechet Poisson (FFP) distribution. Various statistical properties are derived such as quantile function, survival function, hazard rate function moments, and mean deviation. The maximum likelihood procedure was applied to evaluate the model parameters, with precision supported by a simulation study. Using the application of water data, the suggested model is compared with three other existing models such as EGF, GF, and F and test for goodness of fit. The results showed that our model provides a better fit as compared with other existing models.
The data set is taken from the literature and is given in a paper with the reference at the end.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
R. Damo and P. Icka, “Evaluation of water quality index for drinking water,” Polish Journal of Environmental Studies, vol. 22, no. 4, 2013.View at: Google Scholar
A. E. Gurzau, E. Popovici, A. Pintea, O. Popa, C. Pop, and I. Dumitrascu, “Quality of surface water sources from a central transylvanian area as a possible problem for human security and public health,” Carpathian Journal of Earth and Environmental Sciences, vol. 5, no. 2, pp. 119–126, 2010.View at: Google Scholar
A. Rafique and N. Saud, “Exponentiated generalized frechet geometric distribution,” in Proceedings of the 15th Islamic Countries Conference on Statistical Sciences (ICCS-15), p. 189, Lahore, India, March 2019.View at: Google Scholar
F. Louzada, P. Luiz Ramos, and P. Henrique Ferreira, “Exponential-Poisson distribution: estimation and applications to rainfall and aircraft data with zero occurrence,” Communications in Statistics - Simulation and Computation, vol. 49, no. 4, pp. 1024–1043, 2020.View at: Publisher Site | Google Scholar
M. Ibrahim, “The compound Poisson Rayleigh burr xii distribution: properties and applications,” Journal of Applied Probability and Statistics, vol. 15, no. 1, pp. 73–97, 2020.View at: Google Scholar
C. Bonferroni, Elmenti di statistica generale [elements of general statistics], Libreria Seber, Firenze, Ilaaty, 1930.