Research Article  Open Access
Kanisa Chodjuntug, Nuanpan Lawson, "A Chain Ratio ExponentialType Compromised Imputation for Mean Estimation: Case Study on Ozone Pollution in Saraburi, Thailand", Journal of Probability and Statistics, vol. 2020, Article ID 8864412, 6 pages, 2020. https://doi.org/10.1155/2020/8864412
A Chain Ratio ExponentialType Compromised Imputation for Mean Estimation: Case Study on Ozone Pollution in Saraburi, Thailand
Abstract
Due to its impact on health and quality of life, Thailand’s ozone pollution has become a major concern among public health investigators. Saraburi Province is one of the areas with high air pollution levels in Thailand as it is an important industrialized area in the country. Unfortunately, the August 2018 Pollution Control Department (PCD) report contained some missing values of the ozone concentrations in Saraburi Province. Missing data can significantly affect the data analysis process. We need to deal with missing data in a proper way before analysis using standard statistical techniques. In the presence of missing data, we focus on estimating ozone mean using an improved compromised imputation method that utilizes chain ratio exponential technique. Expressions for bias and mean square error (MSE) of an estimator obtained from the proposed imputation method are derived by Taylor series method. Theoretical finding is studied to compare the performance of the proposed estimator with existing estimators on the basis of MSE’s estimators. In this case study, the results in terms of the percent relative efficiencies indicate that the proposed estimator is the best under certain conditions, and it is then applied to the ozone mean estimation for Saraburi Province in August 2018.
1. Introduction
Air pollution is a global problem which results in negative effects on both the environment and human health. Many researchers have found that air pollution is associated with mortality and morbidity from lung cancer, respiratory, cardiovascular diseases, and exacerbation of chronic respiratory conditions [1, 2]. Moultion and Yang [3] have shown that air pollution is correlated with Alzheimer’s disease and other neurodegenerative disorders. From the World Health Organization’s report in 2018, air pollution caused approximately 4.2 million deaths [4]. In fact, children are more vulnerable than adults because their lungs, heart, and brain are still growing. Therefore, air pollution is a major public health concern; monitoring and measuring the quality of air is critical.
The Pollution Control Department (PCD) in Thailand is an agency that measures the amount of air pollutants such as sulphur dioxide , carbon monoxide , nitrogen oxides , particulate matter , and ozone . As shown in the PCD’s Air Quality Management Division report in 2015, and concentrations were higher than standard levels in almost every province [5]. There is also a relationship between and [6, 7]. Naphralan subdistrict, Chaloem Phra Kiat district is an industrialized area in Saraburi Province with traffic congestion and several stone and cement factories. According to the information from the Saraburibased ground monitoring stations of PCD in August 2018 [8], we found that the concentration data were missing due to equipment malfunction or errors in measurement. Missing values may cause a significant effect of data analysis process. We deal with it using standard statistical techniques. In environmental research, a number of techniques can be employed to impute missing values in air pollutants concentration data such as the mean top bottom method, mean imputation method, the multiple regression method, and artificial neural network models [9–13].
In sample surveys, missing values or nonresponses often occur. There are two types of nonresponse: item nonresponse and unit nonresponse. The imputation method is used to handle item nonresponse, and the weight method is applied to deal with unit nonresponse. In addition, imputation which uses available data as a source for replacement of missing data is the most common method to solve missing data.
In addition, many researchers have studied the auxiliary information in order to improve the precision of population mean estimation under a simple random sampling without replacement (SRSWOR). For example, Cochran [14] applied the auxiliary information at the estimation stage and proposed an estimator to estimate the population mean. Bahl and Tuteja [15] first proposed new ratiotype exponential method for estimating the mean of population using information on auxiliary variable, and their methods are more efficient than common methods: mean and ratio methods. Later, Singh and Pal [16] proposed a chain ratioratiotype exponential technique which is more efficient than the common estimators including mean, ratio, and ratiotype exponential estimators under certain condition as follows:where is sample mean of interest variable and and are population mean and sample mean of auxiliary variable , respectively.
Similarly, Lee et al. [17] applied the auxiliary information for the purpose of imputation. Recently, Singh and Horn [18] suggested a compromised imputation method to estimate the population mean as follows:where , and are observed values of and for the unit, and are the set of responding and nonresponding units, respectively, and are sizes of sample and response data, and is a chosen constant.
Under this imputation method, the point estimator of the population mean becomeswhere , , and are response mean of the variable of interest , sample mean of auxiliary variable , and response mean of auxiliary variable , respectively.
The bias and mean square error of are, respectively, given bywhere , , and are population coefficient of variations of and , respectively. is population correlation coefficient between and . , , and are sizes of population, sample, and response data, respectively. Their research showed that the suitably chosen constant is correlated with the performance of the estimator.
In this study, we aim to use the imputation method to estimate the population mean in the presence of nonresponse occurring in the variable of interest only. We propose to improve the compromised imputation method by using the chain ratiotype exponential technique and its corresponding estimator. The bias and the mean square error have been obtained to the first degree of approximation using the Taylor series method. The efficiency of the proposed estimator is compared with some existing estimators on the basis of MSE in order to obtain the certain conditions for application of proposed estimator. In this case study, we use the percent relative efficiency (PRE) as an indicator to assess the performance of the estimator. Then, the best estimator is applied to estimate the population mean of as a variable of interest based on concentration data as a variable of auxiliary of the Saraburi Province’s data in August 2018.
2. Materials and Methods
2.1. Basic Setup Framework
Let be a finite population of size be the value of interest variable , and be value of auxiliary variable . Let and be the population means of and , respectively, and they are unknown values. Let and be the set of responding units and nonresponding units, respectively. The value of is observed for every ; meanwhile, the value of is missing for every and imputed value with . Based on SRSWOR scheme, of size with paired variable is selected from and contains both responding units and nonresponding units. Let , and be the sample mean of and the response mean of and , respectively.
2.2. Existing Imputation Methods and Corresponding Estimation
The mean imputation method, the data after imputation, is defined aswhere .
Under this imputation method, the point estimator of the population mean becomes
The bias and variance of are, respectively, given bywhere and .
The ratio imputation method, the data after imputation is defined aswhere ,
Under this imputation method, the point estimator of the population mean becomeswhere , , and be the sample mean of and the response mean of and , respectively.
The bias and mean square error of are, respectively, given bywhere , , , , , , and .
2.3. Proposed Compromised Imputation Method and Corresponding Estimator
Motivated by Singh and Pal [16] and Singh and Horn [18], we propose a new compromised imputation method by using the idea of chain ratiotype exponential estimator. The data after imputation are defined aswhere denotes suitably chosen constants in order that of the estimator is minimum. , , and be the sample mean of and the response mean of and , respectively.
Under the proposed imputation method, the point estimator of the population mean is given as follows:
Note that if , then , and if , then the is analogue of the estimator for population mean proposed by Singh and Pal [16].
To find the properties of the proposed estimator, both bias and of are considered up to the first degree of approximation by using the Taylor series method. We define , and .
Since, SRSWOR is being followed, and we have ,where , , , , , and .
Naturally, we can use , , , , , and to estimate , , , , , and as population parameters when we cannot find these population parameters.
Next, writing in term of ’s, equation (15) takes the form as follows:
From equation (17), we have
Taking expectation on both sides of equation (18), we get as follows:
Squaring both sides of equation (18), expanding the term and taking expectations, and retaining the terms up to first degree of approximations, we get as follows:
2.4. Estimation of Optimum Value Constant
In this section, we consider that is optimum in order to find the minimum . Since as given in equation (20) is a function of unknown constant , it needs to search for optimum values, such that becomes minimum value. To obtain the constant , we differentiate equation (20) with respect to and equate it to zero as follows:
We solve equation (21), and we get
3. Efficiency Comparison of the Proposed Estimator
Under optimum value constant in equation (22), comparison of with those estimators including , , and is carried out by using and an estimator with a preferred smaller value. We can observe the efficiency of the proposed estimator as follows:
From equations (9) and (20), we haveif or .
From equations (13) and (20), we haveif or .
From equations (5) and (20), we haveif or .
When the conditions in equations (23)–(25) are satisfied, is more efficient than , , and , respectively.
4. Case Study
For the case study, we obtained data level ppb and level on a timescale of one per hour (hourly average) from the PCD website in August 2018. The data belong to the population of 744 units. On examination, we found that the concentration data had missing values, so it was taken as a variable of interest , and concentration data was taken as a variable of auxiliary . The following values were obtained for the considered variables: , , , , , , , , , and . We identified 6.03% of the data as missing.
The conditions of for which is better than the existing estimators are shown in Table 1. The table also presents the percent relative efficiencies of each estimator with respect to which can be computed by .

The scatter plot (Figure 1) indicates that and concentration data have a negative relationship. The correlation coefficient between and concentration data is 0.47. From Table 1, we consider both the certain conditions of and ; accordingly, is more efficient than others. is the most suitable for estimating the mean value of concentration in this case study. The mean value of concentration by using is equal to 13.00 ppb/hr or 0.01 ppm/hr which does not exceed the average standard of the PDC in Thailand ( ppm/hr). The mean square error of is 0.07 which is close to zero, so it indicates that the proposed estimator is effective in this case.
5. Conclusions
In this case study, when missing data occur in the variable of interest, we propose the improved compromised imputation method using the chain ratiotype exponential technique for the population mean estimation. The mean square error of the proposed estimator was studied under general and optimum situations. We suggest this proposed method is useful for estimating the population mean in the presence of common, realworld, nonresponse data, and it is efficient for applying to the real dataset with missing values under certain condition of . In fact, in this case study, we applied our proposed method to the whole process of ozone population mean estimation from real data. On the contrary, the common methods proposed by the other authors have not applied their techniques to estimate population mean from real data. In addition, when a dataset contains some missing values, our proposed method can handle this problem to complete data and can also save both time and budget for research conduction. Therefore, the proposed method is a good strategy to apply in practice in case that a missing data problem occurs. However, this study focused on missing values in the variable of interest only, but we could apply a similar method to cases where missing data happens in both the interest and auxiliary variables.
Data Availability
The data to support this study are available on the website of the Pollution Control Department (PCD) of Thailand (http://air4thai.pcd.go.th/webV2/history/).
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors would like to thank the Pollution Control Department (PCD) of Thailand for supporting the data in this research.
References
 A. J. Cohen, H. Ross Anderson, B. Ostro et al., “The global burden of disease due to outdoor air pollution,” Journal of Toxicology and Environmental Health, Part A, vol. 68, no. 1314, pp. 1301–1307, 2005. View at: Publisher Site  Google Scholar
 Y. Guo, S. Li, B. Tawatsupa, K. Punnasiri, J. J. K. Jaakkola, and G. Williams, “The association between air pollution and mortality in Thailand,” Scientific Reports, vol. 4, Article ID 5, 2014. View at: Google Scholar
 P. V. Moulton and W. Yang, “Air pollution, oxidative stress, and Alzheimer’s disease,” Journal of Environmental and Public Health, vol. 2012, Article ID 472751, 2012. View at: Publisher Site  Google Scholar
 World Health Organization, Ambient Air Pollution, World Health Organization, Geneva, Switzerland, 2019, https://www.who.int/en/newsroom/factsheets/detail/ambient(outdoor)airqualityandhealth.
 Pollution Control Department of Thailand, Thailand State of Pollution Report 2015, Pollution Control Department of Thailand, Bangkok, Thailand, 2018, http://infofile.pcd.go.th/mgt/PollutionReport2015_en.pdf.
 Y. Ding, G. Liu, X. Guo et al., “Correlation analysis between ozone and other atmosphere characteristics,” Chemical Engineering Transactions, vol. 61, pp. 1747–1752, 2017. View at: Publisher Site  Google Scholar
 T. Nishanth, K. M. Praseed, M. K. S. Kumar, and K. T. Valsaraj, “Influence of ozone precursors and PM10 on the variation of surface O_{3} over Kannur, India,” Atmospheric Research, vol. 138, no. 3, pp. 112–124, 2014. View at: Publisher Site  Google Scholar
 Pollution Control Department of Thailand, Thailand’s Air Quality and Situation Report, Pollution Control Department of Thailand, Bangkok, Thailand, 2018, http://air4thai.pcd.go.th/webV2/history/.
 S. Ali and S. Dacey, “Technical review: performance of existing imputation methods for missing data in SVM ensemble creation,” International Journal of Data Mining and Knowledge Management Process, vol. 7, no. 6, pp. 75–91, 2017. View at: Publisher Site  Google Scholar
 A. Arroyo, A. Herrero, V. Trico, E. Corchado, and M. Woźniak, “Neural models for imputation of missing ozone data in airquality dataset,” Complexity, vol. 2018, Article ID 7238015, 2018. View at: Publisher Site  Google Scholar
 M. D. N. F. Fitri, N. A. Ramli, A. S. Yahaya et al., “Monsoonal differences and probability distribution of concentration,” Environmental Monitoring and Assessment, vol. 163, no. 1, pp. 655–667, 2010. View at: Google Scholar
 N. M. Noor, A. S. Yahaya, N. A. Ramli, and M. M. A. B. Abdhullah, “The replacement of missing values of continuous air pollution monitoring data using mean top bottom imputation technique,” Journal of Engineering Research and Education, vol. 3, pp. 96–105, 2006. View at: Google Scholar
 M. N. Norazian, Y. A. Shukri, R. N. Azam et al., “Estimation of missing values in air pollution data using single imputation technique,” ScienceAsia, vol. 34, pp. 341–345, 2008. View at: Publisher Site  Google Scholar
 W. G. Cochran, Sampling Technique, John Wiley and Sons, New York, NY, USA, 3rd edition, 1997.
 S. Bahl and R. K. Tuteja, “Ratio and product type exponential estimators,” Journal of Information and Optimization Sciences, vol. 12, no. 1, pp. 159–164, 1991. View at: Publisher Site  Google Scholar
 H. P. Singh and S. K. Pal, “A new chain ratioratio type exponential estimator using auxiliary information in sample surveys,” International Journal of Mathematics and its Applications, vol. 3, no. 4, pp. 37–46, 2015. View at: Google Scholar
 H. Lee, E. Rancourt, and C. E. Sarndal, “Experiments with variance estimation from survey data with imputed values,” Journal of Official Statistics, vol. 10, no. 3, pp. 231–243, 1994. View at: Google Scholar
 S. Sing and S. Horn, “Compromised imputation in survey sampling,” Metrika, vol. 51, pp. 266–276, 2000. View at: Google Scholar
Copyright
Copyright © 2020 Kanisa Chodjuntug and Nuanpan Lawson. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.