Journal of Probability and Statistics

Volume 2014 (2014), Article ID 864965, 6 pages

http://dx.doi.org/10.1155/2014/864965

## A Study of Probability Models in Monitoring Environmental Pollution in Nigeria

^{1}Department of Mathematics, Covenant University, Ota, Ogun State, Nigeria^{2}Department of Statistics, University of Ilorin, Ilorin, Nigeria

Received 29 January 2014; Revised 17 April 2014; Accepted 18 April 2014; Published 5 May 2014

Academic Editor: Zhidong Bai

Copyright © 2014 P. E. Oguntunde et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In Lagos State, Nigeria, pollutant emissions were monitored across the state to detect any significant change which may cause harm to human health and the environment at large. In this research, three theoretical distributions, Weibull, lognormal, and gamma distributions, were examined on the carbon monoxide observations to determine the best fit. The characteristics of the pollutant observation were established and the probabilities of exceeding the Lagos State Environmental Protection Agency (LASEPA) and the Federal Environmental Protection Agency (FEPA) acceptable limits have been successfully predicted. Increase in the use of vehicles and increase in the establishment of industries have been found not to contribute significantly to the high level of carbon monoxide concentration in Lagos State for the period studied.

#### 1. Introduction

It is common knowledge that population growth and globalization have become the major drivers of pollution. Out of the various forms of pollution, a large number of studies that investigated the relationship between air quality and health effects cited air pollution as the major environmental issue of concern to the community. Increase in hospitalization, emergency room attendance, and decreased lung function have been associated with the following common air pollutants: carbon monoxide (CO), nitrogen oxides (NO_{x}), inhalable particles (measured as PM_{10}), photochemical oxidants (measured as ozone), and sulphur dioxide SO_{2}.

Air pollution is defined as the presence in the outdoor atmosphere of one or more pollutants in such quantities and of such duration that may tend to be injurious to human, plant, or animal life or property or which may unreasonably interfere with the comfortable enjoyment of life or property or the conduct of business [1, 2].

In this research work, emphasis will be on one of these criteria pollutants which is carbon monoxide because of the major threats it poses to human health.

Carbon monoxide is a colourless, odourless, and highly poisonous gas produced in large quantities as a result of incomplete combustion of fossil fuels. It is known that the main source of carbon monoxide is from motor vehicle exhaust (vehicular emission); about two-thirds of the pollutant emissions come from transportation sources, while other sources include industrial processes and open burning activities [3, 4].

The natural concentration of carbon monoxide in air is around 0.2 ppm, and that amount is not harmful to humans, while exposure to the pollutant emission at 100 ppm or greater can be dangerous to human health. Carbon monoxide endangers humans specifically by its tendency to combine with haemoglobin in the blood. Their combination produces carboxyl haemoglobin (COHB), thus reducing the capacity of the blood to carry oxygen [5]. The acute effects produced by exposure to carbon monoxide (in parts per million) are given in Table 1.

Probability models have been applied successfully in many physical phenomena such as wind speed, rainfall, river discharges, and air quality. It has been applied to fit the data of vehicular emission in Chennai, India, for predicting the concentration of carbon monoxide in the ambient atmosphere [6, 7]. In their research, ten standard probability models were fitted to the data and goodness of fit was assessed using Kolmogorov-Smirnov test and Anderson-Darling test.

When the parent probability distribution of air pollutants is correctly chosen, the specific distribution can be used to predict the mean concentration and probability of exceeding a critical concentration [5, 8].

The objectives of this paper are to fit the three probability distributions afore-mentioned to the concentration of carbon monoxide in Lagos State, Nigeria, to determine the “best” distribution to describe the data, and to establish the distribution of carbon monoxide concentration with a view of predicting the probability that the concentration would exceed a critical or an acceptable concentration.

To this effect, observations on the pollutant concentration were collected (as available) between the years 2004 and 2010. As vehicular exhaust (emission) is the major source of carbon monoxide, information was also collected on the number of newly registered vehicles and the number of newly registered industries in Lagos State between the years 2004 and 2010.

#### 2. Methodology

*Weibull Distribution*. Let denote a random variable; the two-parameter Weibull density function [9] is given by
where is the shape parameter and is the scale parameter.

*Lognormal Distribution.* A random variable is log-normally distributed if is normally distributed. Its probability density function [10] is given by
where is the location parameter and as well the mean of the distribution and is the scale parameter and as well the standard deviation of the distribution.

*Gamma Distribution.* Let denote a random variable, the two parameter gamma density function [11] with parameters and is given by
where is the shape parameter and is the scale parameter.

##### 2.1. Methods of Parameter Estimation

The parameters of the distributions can be estimated using various methods like the method of maximum likelihood estimation (MLE) and method of moments (MOM) among others. In this paper, the method of likelihood estimation will be used because it is commonly used and it always gives a minimum variance estimate of parameters.

The MLE is widely and commonly used because it has many desirable properties; the maximum likelihood estimator is consistent, asymptotically normal, and asymptotically efficient. Let be a random sample of size “” drawn from a p.d.f, , where is an unknown parameter. The Likelihood function of this random sample is the joint density function of the “” random variables and it is a function of the unknown parameter [12]. Thus, is the likelihood function. The maximum likelihood estimator (MLE) of , say , is the value of that maximizes or, equivalently, the logarithm of . The MLE of is a solution of

According to [12], the maximum likelihood estimators and of the shape and scale parameters of Weibull distribution are the solution of the simultaneous equations

For lognormal distribution, the maximum likelihood estimates for *μ* and are given by
Lastly, the maximum likelihood estimators and for gamma distribution are solutions of the simultaneous equations
where is a digamma function with an argument defined as

##### 2.2. Weighted Least Squares

Weighted least squares is an efficient method that makes good use of small data sets. The main advantage that WLS enjoys over other methods is the ability to handle regression situations in which the data points are of varying quality. If the standard deviation of the random errors in the data is not constant across all levels of the explanatory variables, using WLS with* weights that are inversely proportional to the variance* at each level of the explanatory variables yields the most precise parameter estimates possible. Consider

Since the sample sizes of the data also varies, the weight used in this research work is

The WLS estimate of *β* is given by

The matrix of is given by

Fitting this model is equivalent to minimizing

##### 2.3. Test of Goodness of Fit

In order to verify the goodness of fit of the models to the carbon monoxide data observations, the Kolmogorov-Smirnov (K-S) and Anderson-Darling (A-D) tests are used. The lower the value of these statistics is, the closer the fitted distribution appears to match the data. The hypothesis for the tests is given as follows: *H*_{o}: the data follow a specified distributionversus* **H*_{1}: the data do not follow the specified distribution.Given “” ordered data points , the test statistics for the Kolmogorov–Smirnov test are given as

The test statistics for Anderson-Darling are given by where is the CDF of the continuous distribution being tested and are the ordered data.

##### 2.4. Probability of Exceedance

The probability that carbon monoxide observations would exceed a specified standard or limit is based on the distribution that has been chosen as the best distribution for Carbon monoxide concentration in Lagos State for the period studied.

Mathematically, the probability of exceeding a critical concentration [13, 14] is given by

#### 3. Summary of the Data Collected

In this section, we provide and describe the information gathered on carbon monoxide concentration, number of newly registered vehicles, and industries.

##### 3.1. Data on Carbon Monoxide Concentration

This section provides information on the secondary data collected on the concentration of carbon monoxide measured in parts per million (ppm) in Lagos State (as available) from August 2004 to August 2010. The data was collected as daily data but we could only gather 412 data points for the years considered (e.g., there was no record at all for the year 2007, as shown in Table 7). The data has been summarized in Table 2 giving the minimum and maximum values of the measurement recorded, the standard deviation, mean, and the mode of the observations.

The diagrammatic representation of the data on carbon monoxide concentration (ppm) is given in Figure 1.

It can be deduced from Figure 1 that the information on the carbon monoxide concentration (as collected) is positively skewed and mode occurs at 0 ppm. This justifies our reason for using positively skewed theoretical distributions to model the data set in this paper.

##### 3.2. Data on Registered Vehicles

In this section, we provide information on the number of vehicles (trucks, buses, and cars) that were registered in Lagos State each year between the years 2004 and 2010. The data is provided in Table 3.

The graphical representation of the number of newly registered vehicles is given in Figure 2.

It can be observed from Figure 2 that there was a little decline in the number of registered vehicles in 2006 and a sharp increase in year 2007 and the highest registration was recorded in year 2008.

##### 3.3. Data on Registered Industries

Table 4 shows the summary of the information collected on the number of newly registered industries (manufacturing industries) in Lagos State between 2004 and August, 2010. It should be noted that there are more industries in Lagos State apart from the ones captured in this paper but we only consider manufacturing industries that are registered.

The graphical representation of the number of newly registered industries is given in Figure 3.

It can be noticed in Figure 3 that only few manufacturing industries are registered. Besides, the records keep increasing from year 2004 to year 2010 except in 2009 where there was a little decline.

#### 4. Analysis and Results

The parameters of the distributions under study (Weibull, lognormal, and gamma) were estimated by fitting the distributions to the data of carbon monoxide concentration collected using Easy-Fit statistical package.

##### 4.1. Test of Goodness of Fit

In an attempt to choose the “best” probability model to describe the concentration of carbon monoxide in Lagos State for the period studied, Kolmogorov-Smirnov goodness of fit test was conducted. The summary of the analysis is given in Table 6.

The graph for the Cumulative Density Function (CDF) of the three distributions is shown in Figure 4.

This graph shows how well Weibull, lognormal, and gamma distributions fit the data. It can be seen that the CDF of the gamma distribution is closer to the true CDF of the carbon monoxide concentration.

##### 4.2. Probability of Exceeding Critical Concentrations

Since gamma distribution fits the data better than the remaining fitted distributions, the probability that the carbon monoxide concentration would exceed both the Lagos State Environmental Protection Agency (LASEPA) standard (5 ppm) and the Federal Environmental Protection Agency (FEPA) standard (10 ppm) will be calculated based on the cumulative density function (CDF) of gamma distribution.

The probability density function of a gamma distribution with parameters *α* and *β* is given by

And the cumulative density function (CDF) is

From Table 5, the shape parameter , the scale parameter , and

Hence, the probability that the carbon monoxide concentration would exceed LASEPA standard is

Also,

Then, the probability that the carbon monoxide concentration would exceed FEPA standard is

##### 4.3. Linear Regression Modelling

The mean yearly carbon monoxide concentration (in ppm) will be regressed on the number of newly registered vehicles () and the number of newly registered industries (). There was no data available for carbon monoxide concentration in the year 2007; therefore, the year 2007 is automatically ignored in the regression analysis. Table 7 shows the summary of the data used for the regression analysis.

Using MINITAB statistical package, regressing on both variable and gives the results shown in Table 8.

The regression equation is

Equation (26) is interpreted as follows.

There will be a decrease of 0.000276 in for a unit change in when variable is held fixed and there will be an increase of 0.049 in for a unit change in when variable is held fixed.

*Analysis of Variance (ANOVA)*

*Hypothesis* *H*_{o}versus *H*_{o} for at least one .

*Decision Rule.* Reject if value is less than the level of significance .

*Decision.* We do not reject since 0.171 is not less than 0.05.

*Inference.* From Table 8, considering the respective values for the parameters and , it means that the regression parameters are not significantly different from zero with an* R*-Sq = 69.2% and* R*-Sq (adjusted) = 48.7%.

Also, from Table 9, based on the value (0.171), we conclude that the regression model is not significant at .

#### 5. Conclusion

In this paper, we have been able to establish (based on the data collected) that the distribution of the carbon monoxide observations in Lagos State between the periods studied is positively skewed as shown in Figure 1. Gamma distribution is considered the best distribution for modeling carbon monoxide concentration in Lagos State as confirmed by the Kolmogorov-Smirnov and Anderson-Darling tests in Table 6. The carbon monoxide concentration in Lagos State exceeds the Lagos State Environmental Protection Agency (LASEPA) and the Federal Environmental Protection Agency (FEPA) standards with probabilities 0.300819 and 0.231621, respectively. Increase in the use of vehicles and increase in the establishment of industries in Lagos State do not contribute significantly to the high carbon monoxide concentration levels. Perhaps, further researches could be focused on the age of the car engines, quality of the fuel used for vehicles and machineries, then the smoking activities in the state.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### References

- F. I. Abam and G. O. Unachukwu, “Vehicular emissions and air quality standards in Nigeria,”
*European Journal of Scientific Research*, vol. 34, no. 4, pp. 550–560, 2009. View at Google Scholar · View at Scopus - L. W. Canter,
*Environmental Impact Assessment*, McGraw Hill, New York, NY, USA, 2nd edition, 1996. - E. L. Kanyode,
*Pollutants emissions measured [LUMES thesis]*, Lund University, Lund, Sweden, 2004. - C. A. Ibrahim,
*Overview of Air Quality Management in Malaysia*, University Science of Malaysia, 2004. - G. Fellenberg,
*The Chemistry of Pollution*, John Wiley & Sons, Oxford, UK, 2000. - H.-D. Kan and B.-H. Chen, “Statistical distributions of ambient air pollutants in Shanghai, China,”
*Biomedical and Environmental Sciences*, vol. 17, no. 3, pp. 366–372, 2004. View at Google Scholar · View at Scopus - M. Harikrishna and C. Arun, “Stochastic analysis for vehicular emissions on urban roads—a case study of Chennai,” in
*Proceedings of the 3rd International Conference on Environmental and Health*, M. J. Bunch, V. M. Suresh, and T. V. Kumaran, Eds., Chennai, India, December 2003. - H.-C. Lu and G.-C. Fang, “Predicting the exceedances of a critical PM
_{10}concentration—a case study in Taiwan,”*Atmospheric Environment*, vol. 37, no. 25, pp. 3491–3499, 2003. View at Publisher · View at Google Scholar · View at Scopus - Y. Lei, “Evaluation of three methods for estimating the Weibull distribution parameters of Chinese pine (
*Pinus tabulaeformis*),”*Journal of Forest Science*, vol. 54, no. 12, pp. 566–571, 2008. View at Google Scholar · View at Scopus - B. F. Ginos,
*Parameter estimation for the lognormal distribution [M.S. thesis]*, Brigham Young University, Provo, Utah, USA, 2009. - C. Forbes, M. Evans, N. Hastings, and B. Peacock,
*Statistical Distributions*, John Wiley & Sons, Hoboken, NJ, USA, 4th edition, 2011. View at MathSciNet - M. A. Stephens, “EDF statistics for goodness of fit and some comparisons,”
*Journal of the American Statistical Association*, vol. 69, no. 347, pp. 730–737, 1974. View at Publisher · View at Google Scholar - A. S. Yahaya and N. A. Ramli, “Modelling of carbon monoxide concentration in major towns in Malaysia: a case study in Penang, Kuching and Kuala Lumpur,”
*Project Report*, Universiti Sains Malaysia, 2008. View at Google Scholar - A. Zaharim, S. Najid, A. Razali, and K. Sopian, “Analyzing Malaysian wind speed data using statistical distribution,” in
*Proceedings of the 4th IASME/WSEAS International Conference on Energy & Environment (EE ’09)*, pp. 363–370, 2009. - “Carbon monoxide poisoning; Signs and Symptoms,” http://en.wikipedia.org/wiki/Carbon_monoxide_poisoning.