Scientific Programming

Volume 2019, Article ID 6780379, 12 pages

https://doi.org/10.1155/2019/6780379

## A Novel Phase Space Reconstruction- (PSR-) Based Predictive Algorithm to Forecast Atmospheric Particulate Matter Concentration

Correspondence should be addressed to Wajid Aziz; moc.oohay@dijaw_hk

Received 26 March 2019; Revised 12 June 2019; Accepted 4 July 2019; Published 25 July 2019

Academic Editor: Cristian Mateos

Copyright © 2019 Syed Ahsin Ali Shah et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The prediction of atmospheric particulate matter (APM) concentration is essential to reduce adverse effects on human health and to enforce emission restrictions. The dynamics of APM are inherently nonlinear and chaotic. Phase space reconstruction (PSR) is one of the widely used methods for chaotic time series analysis. The APM mass concentrations are an outcome of complex anthropogenic contributors evolving with time, which may operate on multiple time scales. Thus, the traditional single-variable PSR-based prediction algorithm in which data points of last embedding dimension are used as a target set may fail to account for multiple time scales inherent in APM concentrations. To address this issue, we propose a novel PSR-based scientific solution that accounts for the information contained at multiple time scales. Different machine learning algorithms are used to evaluate the performance of the proposed and traditional PSR techniques for predicting mass concentrations of particulate matter up to 2.5 micron (PM_{2.5}), up to 10 micron (PM_{10.0}), and ratio of PM_{2.5}/PM_{10.0}. Hourly time series data of PM_{2.5} and PM_{10.0} mass concentrations are collected from January 2014 to September 2015 at the Masfalah air quality monitoring station (couple of kilometers from the Holy Mosque in Makkah, Saudi Arabia). The performances of various learning algorithms are evaluated using RMSE and MAE. The results demonstrated that prediction error of all the machine learning techniques is smaller for the proposed PSR approach compared to traditional approach. For PM_{2.5}, FFNN leads to best results (both RMSE and MAE 0.04 *μ*gm^{−3}), followed by SVR-L (RMSE 0.01 *μ*gm^{−3} and MAE 0.09 *μ*gm^{−3}) and RF (RMSE 1.27 *μ*gm^{−3} and MAE 0.86 *μ*gm^{−3}). For PM_{10.0}, SVR-L leads to best results (both RMSE and MAE 0.06 *μ*gm^{−3}), followed by FFNN (RMSE 0.13 *μ*gm^{−3} and MAE 0.09 *μ*gm^{−3}) and RF (RMSE 1.60 *μ*gm^{−3} and MAE 1.16 *μ*gm^{−3}). For PM_{2.5}/PM_{10.0}, FFNN is the best and accurate method for prediction (0.001 for both RMSE and MAE), followed by RF (0.02 for both RMSE and MAE) and SVR-L (RMSE 0.05 *μ*gm^{−3} and MAE 0.04).

#### 1. Introduction

Air pollution is one of the emerging environmental issues in the developing as well as developed countries across the globe [1]. A large amount of gaseous pollutants and other atmospheric particulate matter (APM) are being produced through immense pollution generating activities including vehicles emitting smoke and fossil fuels used for energy requirements, cooking, and different anthropogenic activities [2]. APM is reportedly one of the major causes of adverse health issues particularly which are related to human respiratory and cardiovascular systems [3].

Depending upon aerodynamic diameter, atmospheric particles can be classified into three types, namely, coarse particle fraction (CPF), fine particle fraction (FPF), and ultrafine particles (UFP). CPF comprises of diameter larger than 2.5 micrometer () and up to (PM_{10.0}), while FPF has diameter up to (PM_{2.5}), and those having less than (PM_{1.0}) diameter are UFP [4]. Crustal material, paved road dust, background sea salts, and noncatalyst equipped gasoline engines are major sources of CPF (PM_{10.0}), while vapor nucleation/condensation mechanisms and anthropogenic sources are responsible for FPF (PM_{2.5}) [5]. The lifetime of atmospheric particles, spanned from few seconds to several months, is another aspect of such particles which determines their harmfulness [4]. Beside emission sources, levels of PM_{2.5} and PM_{10.0} depend on the geographic characteristics and meteorological parameters including wind, relative humidity temperature, atmospheric pressure, and boundary layer height [6, 7].

Air quality can be predicted through time series analysis which in turn may be used for issuing warnings to protect the health of the public. The classical approaches which predict air pollutant concentrations are generally based on functional relationship of air quality, emissions, and metrological factors. Examples include regression and neural network techniques, which have been used to predict APM in numerous studies [8–11]. In the absence of emission data and/or metrological factors, pollutant concentration time series data are the only available information. Therefore, in such cases, linear correlation-based univariate analysis techniques including autocorrelation function and spectral analysis [8, 12] are generally used. These techniques predict time series, which have regular behavior. Contrary to linearity, the dynamics of atmospheric pollutants are complex in nature; thus, nonlinearity is inherent in the atmospheric systems. The time series data of atmospheric mass concentrations are chaotic and very sensitive to initial conditions [13, 14].

Phase space reconstruction (PSR) is the foundation of nonlinear time series analysis that allows the reconstruction of complete system dynamics using a single time series [15]. The most common approach for PSR time series is based on Takens’ delay embedding theorem [16]. Using this theorem, a single vector of observations representing a chaotic system can be regenerated into multidimensional vectors series. The regenerated vectors can thus display numerous essential properties of its real time series provided that the embedding dimension is considerably large [17]. Two parameters are important for the computation of PSR, i.e., time delay () and embedding dimension ().

Numerous studies used PSR-based techniques to capture complex dynamics of particulate matter mass concentration time series [13, 14, 18–25], which were then used for prediction purpose. Li et al. [18] performed nonlinear analysis of air quality data to identify the dynamics of the ozone concentrations and to determine dimensionality of the system. Chen et al. [19] proposed a novel procedure, based on dynamical systems theory, to model and predict ozone levels by creating a multidimensional phase space map from observed ozone concentrations. The proposed model was used to make one hour to one day ahead predictions of ozone levels. Kocak et al. [20] reconstructed the attractor in the multidimensional space of the univariate ozone time series and then used local approximation to predict the ozone concentration at different stations. Chelani et al. [21] examined the predictability of chaotic time series of air pollutant (nitrogen dioxide) concentration using artificial neural networks. Chelani and Devotta [22] predicted PM_{10.0} using local polynomial approximation based on the reconstructed phase space. In another study, Chelani and Devotta [23] developed a hybrid model using the combination of the autoregressive integrated moving average model, which deals with linear patterns, and nonlinear dynamical model. Using the nitrogen dioxide concentration time series, they demonstrated that the hybrid model outperforms the individual linear and nonlinear models. Kumar et al. [13] employed a correlation dimension method that uses PSR to identify nonlinearity and chaos in nitrogen dioxide and carbon mono-oxide time series. Yu et al. [24] employed PSR to air pollution index time series during past 10 years and found that PM_{10.0} time series behavior is chaotic in Lanzhou, China. Saeed et al. [25] investigated chaotic behavior of PM_{1.0} and PM_{2.5} concentrations using PSR, largest Lyapunov exponent, and Hurst exponent and found strong chaotic behavior in the time series.

The previous studies [26–28] used last embedding dimension data points of PSR time series as the target set. Recently, the concept of multiple time scales has been introduced to study dynamics of healthy and pathological physiological systems such as regularity mechanism of cardiovascular system [29, 30], postural control [31], and gait dynamics [32]. The APM mass concentrations are an outcome of complex natural and anthropogenic contributors evolving with time, which may operate on multiple time scales. Thus, the traditional single-variable PSR algorithm [26–28] in which data points of last embedding dimension are used as a target dataset may fail to account for multiple time scales inherent in APM concentrations.

In this study, we propose a novel PSR-based scientific solution that accounts for the information contained at multiple time scales to predict mass concentrations of atmospheric particulates in air. The data used in this study are collected from the Masfalah air quality monitoring station, Makkah, Saudi Arabia [6]. Previously Munir et al. [6] used these data to analyze the mass concentrations of PM_{2.5} and its association with PM_{10.0} and meteorology. This site is important because throughout the year, huge number of pilgrims visit Saudi Arabia to perform religious obligations using this road. Makkah is surrounded by large sandy deserts, receives little rain, and experiences high temperature throughout the year [6]. The expansion of Holy mosque, construction of railway train stations, mountain digging and construction of multistoried buildings, frequent sand and dust storms, frequent traffic jams, and congestions during the busy hours constitute the atmospheric pollution in the city [6, 7]. Millions of pilgrims visiting for Umrah and Hajj every year put additional burden on local resources and air quality. Moreover, due to the geographical characteristics and climatic conditions, PM_{2.5} and PM_{10.0} pollutants frequently exceed the national and international air quality standards, which is one of the major concerns in this region [6, 33]. Hence, early prediction is a managerial solution to avoid hazardous implications of atmospheric particulates on the local community as well as pilgrims.

Machine learning techniques have widely been used for classification, clustering, and association that are applied in numerous fields [34, 35]. Recently, a method of PSR of a chaotic model and support vector machine (SVM) in the field of artificial intelligence have been explored to realize the prediction of time series [36]. We used different machine learning techniques including support vector regression (SVR), random forest (RF), and feedforward neural network (FFNN) [37–39] for prediction of atmospheric particulates based on proposed and traditional settings of the target set. Root-mean-squared error (RMSE) and mean absolute error (MAE) measures are used to evaluate the performance of various learning algorithms for the prediction of atmospheric particulates by employing proposed and traditional PSR methods.

#### 2. Materials and Methods

##### 2.1. Datasets

The data used in this research work have been collected from the Masfalah air quality monitoring station (AQMS111) in the Holy city of Makkah, Saudi Arabia. The data were previously used by Munir et al. [6] to characterize the spatial and temporal variability of PM_{2.5}, PM_{10.0}, and their ratio PM_{2.5}/PM_{10.0} in the region.

The concentrations of PM_{2.5} and PM_{10.0} were monitored using Aeroqual AQM 60 air quality monitoring station [6]. This device uses light scattering nephelometer and high-precision sharp cut cyclone to monitor particles and has a range of 0–2000 *μ*gm^{−3} with an accuracy of ±2 for both PM_{2.5} and PM_{10.0}. Hourly data collected from January 2014 to September 2015 of PM_{2.5} (*μ*gm^{−3}), PM_{10.0} (*μ*gm^{−3}), and ratio of PM_{2.5}/PM_{10.0} have been used to evaluate the usefulness of the proposed modification in the PSR prediction algorithm. The quality of data is ensured by taking strict quality assurance and quality control (QA/QC) measures [6]. QA measures include careful selection of monitoring site, proper instrument installation, instrument selection, sample system design, and proper training of operators. QC is ensured by taking measures including careful selection of monitoring site, instrument calibration and its response, monitoring calibration gases, routine site visit, and data review as well as data validation and ratification. Data screening for missing values and outliers was done. Kline [40] suggested that missing data can be handled by deletion, imputation estimates or by modeling the data as a distribution for its estimation. If missing data are <5%, then any simple mechanism is acceptable for its identification and correction [41]. Both PM_{2.5} and PM_{10.0} data contain less than 2% missing values, and we used deletion approach for handling missing data. The outliers in the data are replaced by means of data for that specific month.

##### 2.2. Methodology

Before describing the proposed PSR methodology, traditional PSR technique and procedures for selection of time delay and embedding dimension are detailed for clarity of methodology.

###### 2.2.1. Phase Space Reconstruction (PSR)

PSR [14] theory is the base for chaotic time series. In a chaotic system, phase space can be used for the reconstruction of univariate time series. This is because in a dynamical system, whole information about the variable is present in the univariate time series. Each point of phase space represents a state of the system, while trajectory of the phase space represents the time evolution of the system according to different initial conditions.

Using Takens’ time-delay embedding theorem, a phase space can be created from a one-dimensional time series [14]. This theorem is actually a way for analyzing chaotic time series. According to the theorem, if a scalar time series from a chaotic system is given, then reconstruction is possible in terms of the phase space vectors expressed as: where . Here, is the time delay, is the embedding dimension of PSR, and is the number of phase points of reconstructed phase space. Computation of and values are very essential in PSR.

The selection of has centered around two commonly used methods, i.e., autocorrelation function (ACF) and average mutual information (AMI) [42]. The ACF is used for estimating of linear time series, whereas AMI is used for estimating for nonlinear time series. Since mass concentration time series data of atmosphere is nonlinear in nature, we used the AMI function, which accounts for the nonlinear correlation in a specific time series to evaluate ‘*τ*’ for that time series [42]. The equation to calculate AMI is as follows:where is the probability density of . is the joint probability density of and . is a measure of the statistical dependence of the reconstruction variables. For nonmonotonous decrease of , the location of first local minimum is considered as the suitable value of [43]. For monotonous decrease of , either the decrease of MI to or can be used as the criterion for estimating time delay [43].

The false nearest neighbor (FNN) approach introduced by Kennel et al. [43] is used for computing optimal . The FNN algorithm takes each point in the -dimensional portrait and finds the distance to its nearest neighbor and the distance between the two points in dimensions. Neighbors are said to be false if the following two criteria are met [43]:where is the relative increase in the Euclidean distance when the dimension of PSR is increased from to , and it is computed as

The parameters *R*_{tol} and *A*_{tol} are constant thresholds, and *R*_{A} is the standard deviation of a time series. The process is repeated for dimensions and is stopped when the proportion of FNN becomes zero or necessarily small and will remain so from then onwards.

###### 2.2.2. Proposed Methodology

The whole procedure of PSR-based prediction is illustrated in Figure 1.