#### Abstract

The prediction of atmospheric particulate matter (APM) concentration is essential to reduce adverse effects on human health and to enforce emission restrictions. The dynamics of APM are inherently nonlinear and chaotic. Phase space reconstruction (PSR) is one of the widely used methods for chaotic time series analysis. The APM mass concentrations are an outcome of complex anthropogenic contributors evolving with time, which may operate on multiple time scales. Thus, the traditional single-variable PSR-based prediction algorithm in which data points of last embedding dimension are used as a target set may fail to account for multiple time scales inherent in APM concentrations. To address this issue, we propose a novel PSR-based scientific solution that accounts for the information contained at multiple time scales. Different machine learning algorithms are used to evaluate the performance of the proposed and traditional PSR techniques for predicting mass concentrations of particulate matter up to 2.5 micron (PM_{2.5}), up to 10 micron (PM_{10.0}), and ratio of PM_{2.5}/PM_{10.0}. Hourly time series data of PM_{2.5} and PM_{10.0} mass concentrations are collected from January 2014 to September 2015 at the Masfalah air quality monitoring station (couple of kilometers from the Holy Mosque in Makkah, Saudi Arabia). The performances of various learning algorithms are evaluated using RMSE and MAE. The results demonstrated that prediction error of all the machine learning techniques is smaller for the proposed PSR approach compared to traditional approach. For PM_{2.5}, FFNN leads to best results (both RMSE and MAE 0.04 *μ*gm^{−3}), followed by SVR-L (RMSE 0.01 *μ*gm^{−3} and MAE 0.09 *μ*gm^{−3}) and RF (RMSE 1.27 *μ*gm^{−3} and MAE 0.86 *μ*gm^{−3}). For PM_{10.0}, SVR-L leads to best results (both RMSE and MAE 0.06 *μ*gm^{−3}), followed by FFNN (RMSE 0.13 *μ*gm^{−3} and MAE 0.09 *μ*gm^{−3}) and RF (RMSE 1.60 *μ*gm^{−3} and MAE 1.16 *μ*gm^{−3}). For PM_{2.5}/PM_{10.0}, FFNN is the best and accurate method for prediction (0.001 for both RMSE and MAE), followed by RF (0.02 for both RMSE and MAE) and SVR-L (RMSE 0.05 *μ*gm^{−3} and MAE 0.04).

#### 1. Introduction

Air pollution is one of the emerging environmental issues in the developing as well as developed countries across the globe [1]. A large amount of gaseous pollutants and other atmospheric particulate matter (APM) are being produced through immense pollution generating activities including vehicles emitting smoke and fossil fuels used for energy requirements, cooking, and different anthropogenic activities [2]. APM is reportedly one of the major causes of adverse health issues particularly which are related to human respiratory and cardiovascular systems [3].

Depending upon aerodynamic diameter, atmospheric particles can be classified into three types, namely, coarse particle fraction (CPF), fine particle fraction (FPF), and ultrafine particles (UFP). CPF comprises of diameter larger than 2.5 micrometer () and up to (PM_{10.0}), while FPF has diameter up to (PM_{2.5}), and those having less than (PM_{1.0}) diameter are UFP [4]. Crustal material, paved road dust, background sea salts, and noncatalyst equipped gasoline engines are major sources of CPF (PM_{10.0}), while vapor nucleation/condensation mechanisms and anthropogenic sources are responsible for FPF (PM_{2.5}) [5]. The lifetime of atmospheric particles, spanned from few seconds to several months, is another aspect of such particles which determines their harmfulness [4]. Beside emission sources, levels of PM_{2.5} and PM_{10.0} depend on the geographic characteristics and meteorological parameters including wind, relative humidity temperature, atmospheric pressure, and boundary layer height [6, 7].

Air quality can be predicted through time series analysis which in turn may be used for issuing warnings to protect the health of the public. The classical approaches which predict air pollutant concentrations are generally based on functional relationship of air quality, emissions, and metrological factors. Examples include regression and neural network techniques, which have been used to predict APM in numerous studies [8–11]. In the absence of emission data and/or metrological factors, pollutant concentration time series data are the only available information. Therefore, in such cases, linear correlation-based univariate analysis techniques including autocorrelation function and spectral analysis [8, 12] are generally used. These techniques predict time series, which have regular behavior. Contrary to linearity, the dynamics of atmospheric pollutants are complex in nature; thus, nonlinearity is inherent in the atmospheric systems. The time series data of atmospheric mass concentrations are chaotic and very sensitive to initial conditions [13, 14].

Phase space reconstruction (PSR) is the foundation of nonlinear time series analysis that allows the reconstruction of complete system dynamics using a single time series [15]. The most common approach for PSR time series is based on Takens’ delay embedding theorem [16]. Using this theorem, a single vector of observations representing a chaotic system can be regenerated into multidimensional vectors series. The regenerated vectors can thus display numerous essential properties of its real time series provided that the embedding dimension is considerably large [17]. Two parameters are important for the computation of PSR, i.e., time delay () and embedding dimension ().

Numerous studies used PSR-based techniques to capture complex dynamics of particulate matter mass concentration time series [13, 14, 18–25], which were then used for prediction purpose. Li et al. [18] performed nonlinear analysis of air quality data to identify the dynamics of the ozone concentrations and to determine dimensionality of the system. Chen et al. [19] proposed a novel procedure, based on dynamical systems theory, to model and predict ozone levels by creating a multidimensional phase space map from observed ozone concentrations. The proposed model was used to make one hour to one day ahead predictions of ozone levels. Kocak et al. [20] reconstructed the attractor in the multidimensional space of the univariate ozone time series and then used local approximation to predict the ozone concentration at different stations. Chelani et al. [21] examined the predictability of chaotic time series of air pollutant (nitrogen dioxide) concentration using artificial neural networks. Chelani and Devotta [22] predicted PM_{10.0} using local polynomial approximation based on the reconstructed phase space. In another study, Chelani and Devotta [23] developed a hybrid model using the combination of the autoregressive integrated moving average model, which deals with linear patterns, and nonlinear dynamical model. Using the nitrogen dioxide concentration time series, they demonstrated that the hybrid model outperforms the individual linear and nonlinear models. Kumar et al. [13] employed a correlation dimension method that uses PSR to identify nonlinearity and chaos in nitrogen dioxide and carbon mono-oxide time series. Yu et al. [24] employed PSR to air pollution index time series during past 10 years and found that PM_{10.0} time series behavior is chaotic in Lanzhou, China. Saeed et al. [25] investigated chaotic behavior of PM_{1.0} and PM_{2.5} concentrations using PSR, largest Lyapunov exponent, and Hurst exponent and found strong chaotic behavior in the time series.

The previous studies [26–28] used last embedding dimension data points of PSR time series as the target set. Recently, the concept of multiple time scales has been introduced to study dynamics of healthy and pathological physiological systems such as regularity mechanism of cardiovascular system [29, 30], postural control [31], and gait dynamics [32]. The APM mass concentrations are an outcome of complex natural and anthropogenic contributors evolving with time, which may operate on multiple time scales. Thus, the traditional single-variable PSR algorithm [26–28] in which data points of last embedding dimension are used as a target dataset may fail to account for multiple time scales inherent in APM concentrations.

In this study, we propose a novel PSR-based scientific solution that accounts for the information contained at multiple time scales to predict mass concentrations of atmospheric particulates in air. The data used in this study are collected from the Masfalah air quality monitoring station, Makkah, Saudi Arabia [6]. Previously Munir et al. [6] used these data to analyze the mass concentrations of PM_{2.5} and its association with PM_{10.0} and meteorology. This site is important because throughout the year, huge number of pilgrims visit Saudi Arabia to perform religious obligations using this road. Makkah is surrounded by large sandy deserts, receives little rain, and experiences high temperature throughout the year [6]. The expansion of Holy mosque, construction of railway train stations, mountain digging and construction of multistoried buildings, frequent sand and dust storms, frequent traffic jams, and congestions during the busy hours constitute the atmospheric pollution in the city [6, 7]. Millions of pilgrims visiting for Umrah and Hajj every year put additional burden on local resources and air quality. Moreover, due to the geographical characteristics and climatic conditions, PM_{2.5} and PM_{10.0} pollutants frequently exceed the national and international air quality standards, which is one of the major concerns in this region [6, 33]. Hence, early prediction is a managerial solution to avoid hazardous implications of atmospheric particulates on the local community as well as pilgrims.

Machine learning techniques have widely been used for classification, clustering, and association that are applied in numerous fields [34, 35]. Recently, a method of PSR of a chaotic model and support vector machine (SVM) in the field of artificial intelligence have been explored to realize the prediction of time series [36]. We used different machine learning techniques including support vector regression (SVR), random forest (RF), and feedforward neural network (FFNN) [37–39] for prediction of atmospheric particulates based on proposed and traditional settings of the target set. Root-mean-squared error (RMSE) and mean absolute error (MAE) measures are used to evaluate the performance of various learning algorithms for the prediction of atmospheric particulates by employing proposed and traditional PSR methods.

#### 2. Materials and Methods

##### 2.1. Datasets

The data used in this research work have been collected from the Masfalah air quality monitoring station (AQMS111) in the Holy city of Makkah, Saudi Arabia. The data were previously used by Munir et al. [6] to characterize the spatial and temporal variability of PM_{2.5}, PM_{10.0}, and their ratio PM_{2.5}/PM_{10.0} in the region.

The concentrations of PM_{2.5} and PM_{10.0} were monitored using Aeroqual AQM 60 air quality monitoring station [6]. This device uses light scattering nephelometer and high-precision sharp cut cyclone to monitor particles and has a range of 0–2000 *μ*gm^{−3} with an accuracy of ±2 for both PM_{2.5} and PM_{10.0}. Hourly data collected from January 2014 to September 2015 of PM_{2.5} (*μ*gm^{−3}), PM_{10.0} (*μ*gm^{−3}), and ratio of PM_{2.5}/PM_{10.0} have been used to evaluate the usefulness of the proposed modification in the PSR prediction algorithm. The quality of data is ensured by taking strict quality assurance and quality control (QA/QC) measures [6]. QA measures include careful selection of monitoring site, proper instrument installation, instrument selection, sample system design, and proper training of operators. QC is ensured by taking measures including careful selection of monitoring site, instrument calibration and its response, monitoring calibration gases, routine site visit, and data review as well as data validation and ratification. Data screening for missing values and outliers was done. Kline [40] suggested that missing data can be handled by deletion, imputation estimates or by modeling the data as a distribution for its estimation. If missing data are <5%, then any simple mechanism is acceptable for its identification and correction [41]. Both PM_{2.5} and PM_{10.0} data contain less than 2% missing values, and we used deletion approach for handling missing data. The outliers in the data are replaced by means of data for that specific month.

##### 2.2. Methodology

Before describing the proposed PSR methodology, traditional PSR technique and procedures for selection of time delay and embedding dimension are detailed for clarity of methodology.

###### 2.2.1. Phase Space Reconstruction (PSR)

PSR [14] theory is the base for chaotic time series. In a chaotic system, phase space can be used for the reconstruction of univariate time series. This is because in a dynamical system, whole information about the variable is present in the univariate time series. Each point of phase space represents a state of the system, while trajectory of the phase space represents the time evolution of the system according to different initial conditions.

Using Takens’ time-delay embedding theorem, a phase space can be created from a one-dimensional time series [14]. This theorem is actually a way for analyzing chaotic time series. According to the theorem, if a scalar time series from a chaotic system is given, then reconstruction is possible in terms of the phase space vectors expressed as: where . Here, is the time delay, is the embedding dimension of PSR, and is the number of phase points of reconstructed phase space. Computation of and values are very essential in PSR.

The selection of has centered around two commonly used methods, i.e., autocorrelation function (ACF) and average mutual information (AMI) [42]. The ACF is used for estimating of linear time series, whereas AMI is used for estimating for nonlinear time series. Since mass concentration time series data of atmosphere is nonlinear in nature, we used the AMI function, which accounts for the nonlinear correlation in a specific time series to evaluate ‘*τ*’ for that time series [42]. The equation to calculate AMI is as follows:where is the probability density of . is the joint probability density of and . is a measure of the statistical dependence of the reconstruction variables. For nonmonotonous decrease of , the location of first local minimum is considered as the suitable value of [43]. For monotonous decrease of , either the decrease of MI to or can be used as the criterion for estimating time delay [43].

The false nearest neighbor (FNN) approach introduced by Kennel et al. [43] is used for computing optimal . The FNN algorithm takes each point in the -dimensional portrait and finds the distance to its nearest neighbor and the distance between the two points in dimensions. Neighbors are said to be false if the following two criteria are met [43]:where is the relative increase in the Euclidean distance when the dimension of PSR is increased from to , and it is computed as

The parameters *R*_{tol} and *A*_{tol} are constant thresholds, and *R*_{A} is the standard deviation of a time series. The process is repeated for dimensions and is stopped when the proportion of FNN becomes zero or necessarily small and will remain so from then onwards.

###### 2.2.2. Proposed Methodology

The whole procedure of PSR-based prediction is illustrated in Figure 1.

* Step 1 (PSR). *One-dimensional time series have been projected to higher dimensions using the PSR method to generate high-dimensional series:where and .

The parameters and have been determined by using AMI and FNN methods, respectively. The input and output (target) samples can be represented by the matrixes following

*X*and

*Y*, respectively, in the following forms:The last embedding dimension data points as the target set have been used in numerous studies [24–26]. The concept of multiple time scales has been used in various studies [26–29]; therefore, in this study, it is proposed to use the concept of multiple time scales for the computation of the target set in order to get a better prediction of PSR series. Thus, the target values can be represented asBoth of input () and target () of reconstructed series are divided into two sets, namely, the training set and the test set. The training set consists of the reconstructed series from January 2014 to August 2015, while the test set comprises September 2015 reconstructed series.

* Step 2 (prediction). *The regression model was built for the settings mentioned in step 1 using different learning algorithms (linear and radial SVRs, RF, and FFNN).

* Step 3 (results). *For the evaluation of prediction models, RMSE and MAE were computed.

In traditional PSR prediction, last embedding dimension of reconstructed time series is used as the target set, whereas in the proposed approach, target set data values are computed using the following equation:where , is the embedding dimension and is the time delay. Equation (7) is used in numerous studies [26–29] for constructing coarse grained series at multiple time series. In these studies, original time series has been divided in to nonoverlapping windows and then each window is averaged for constructing multiscale time series. Therefore, in this study, we used the same approach (equation (7)) after transforming the original time series into higher dimension instead of original time series (i.e., each row of PSR series is averaged for computation of the target set at various ).

###### 2.2.3. Support Vector Regression (SVR)

Consider a set of training data , where each represents the input samples with corresponding target value for ( represents training data size) [34]. The generic SVR estimating function takes the following form:where , , and represent a nonlinear transformation from to high-dimensional space. The objective is to find the values of and such that values of can be determined by minimizing the regression risk:where is a constant, represents a cost function, and vector can be written (in terms of data points) as

Using equations (10) and (8), the generic equation can be rewritten aswhere indicates the kernel function.

###### 2.2.4. Random Forest (RF)

RF [38] is an ensemble approach that relies on classification and regression trees (CART) models. The purpose of CART is to learn the relationship between a dependent and a set of predictor variables . The learning algorithm employs recursive partitioning which splits *P* variables to create homogenous grouping of . The recursive partitioning continues until the subset of (at each node) has the same value. RF differs from the CART procedure by (a) employing bootstrap resampling [44], and (b) random variable selection. Consider a regression tree which is made up of splits and nodes. In RF, a random subset of is used to determine the split for each node. For continuous variables, the ensemble estimate is the mean of the predicted values across trees mean () and the variance across trees is .

###### 2.2.5. Feedforward Neural Network (FFNN)

Neural networks are computing models used for recognition of pattern or relation among data [38]. Neural networks comprise of two main components: set of nodes and links between nodes.

The FFNN possesses a massive number of processing elements called neurons. These neurons are interlinked through weights. Neurons have input, output, and hidden layer(s). The summation of weighted values at the input layer is applied to each of hidden layer neurons. Similarly sum weighted values at the hidden layer is applied to the output layers. The output obtained (at the output layer) is given aswhere are the bias and weight parameters. and are the activation functions applied at hidden and output layer, respectively. indicates the input value at the input neuron .

###### 2.2.6. *Performance Evaluation* Measures

The root-mean-squared error (RMSE) and mean absolute error (MAE) have widely been used to measure the performance of predicted models. The range of both measures is from 0 to ∞, and their lowest values show that the performance of the predicted model is better. RMSE can be calculated by taking square root of mean squared error (MSE). It can provide the complete scenario of the error distribution. MAE can be calculated by taking average of absolute differences between the actual and predicted values. Mathematically, RMSE and MAE can be calculated using equations (13) and (14), respectively:where represents the target (expected) values and is the model’s predicted values.

#### 3. Results and Discussion

The original time series of PM_{2.5}, PM_{10.0}, and ratio of PM_{2.5}/PM_{10.0} are shown in Figure 2.

**(a)**

**(b)**

**(c)**

First of all, phase space is reconstructed using equation (4). The selection of two parameters and are important for PSR. AMI has been used for the computation of . In Figure 3, the AMI is plotted against varying , for getting the optimal value of for PM_{2.5}, PM_{10.0}, and PM_{2.5}/PM_{10.0} ratio. The presence of chaos in atmospheric particulates reveals that time series of mass concentrations PM_{2.5} and PM_{10.0} can be described and predicted even if the source information is univariate time series.

**(a)**

**(b)**

**(c)**

Figure 3 depicts that the value of decreases nonmonotonically with increasing for PM_{2.5}, PM_{10.0}, and PM_{2.5}/PM_{10.0} ratio. Hence, the value of at which first minimum of occurred is taken as the optimal . The optimal for PM_{2.5} is 13, for PM_{10.0} is 12, and for PM_{2.5}/PM_{10.0} ratio is 10.

FNN approach is used to find the optimal minimum embedding . For any given , the proportion of the identified FNN for all the neighbors was computed for the given . The percentages of the FNN are plotted as a function of the . A zero FNN percentage indicates the minimum .

The results of FNN approach for determining the optimum of PM_{2.5}, PM_{10.0}, and PM_{2.5}/PM_{10.0} ratio using various values for the threshold parameters and are shown in Figure 4. The value of the parameter is varied from 130 to 190 with a step size of 30, and is used. obtained for PM_{2.5} is 5 and for PM_{10.0} and PM_{2.5}/PM_{10.0} ratio is 6. The higher values of show that the mass concentration time series of PM_{2.5} and PM_{10.0} have dominant degrees of freedom, which indicates that atmospheric particulate dynamics are complex in nature.

**(a)**

**(b)**

**(c)**

Based on the values of parameters and , phase space is reconstructed for PM_{2.5}, PM_{10.0}, and PM_{2.5}/PM_{10.0} ratio and prediction models (using different machine learning algorithms including RF, linear, and radial SVRs and FFNN) are built using traditional and proposed settings. The predicted values for the next 1 month (i.e., September 2015) are obtained by using different learning models (RF, linear, and radial SVRs and FFNN). For both training and testing data, traditional and proposed settings of the target set from PSR series have been used. Target and predicted values of PM_{2.5} are shown in Figure 5. It is clear from the figure that results of the proposed PSR technique are robust compared to the traditional PSR method for all learning algorithms. Learning algorithms FFNN, SVR-L, and RF show perfect overlap of the predicted and actual values for proposed settings. Figure 6 shows the prediction results of PM_{10.0} for different learning algorithms using proposed and traditional settings. The results revealed that like PM_{2.5}, learning algorithms FFNN, SVR-L, and RF for PM_{10.0} also showed perfect overlap between predicted and actual values. In case of both PM_{10.0} and PM_{2.5}, prediction results of SVR-R are modest for both proposed and traditional settings. It can be observed from Figure 6 that SVR-L shows model underestimation and SVR-R shows model overestimation for traditional PSR settings. This may be due to that fact that in the case of traditional PSR, the target set is the last embedding PSR, whereas in the case of the proposed PSR method, the target set is the row average of the reconstructed phase space data point. The averaging process yielded better prediction and avoided model over and underestimation. Figure 7 shows the prediction results for the joined dataset (i.e., the ratio PM_{2.5}/PM_{10.0}). The proposed PSR showed a better prediction result, with FFNN and RF showing almost perfect overlap. The SVR-R showed model overestimation for both traditional and proposed settings.

Prediction errors between actual and predicted values in terms of RMSE and MAE are presented in Table 1. The table compares the performances of different machine learning algorithms using the proposed and traditional settings. The results depict that prediction error of all the machine learning techniques is smaller for the proposed PSR approach compared to traditional approach.

For PM_{2.5}, FFNN leads to best results (both RMSE and MAE 0.04 *μ*gm^{−3}), followed by SVR-L (RMSE 0.01 *μ*gm^{−3} and MAE 0.09 *μ*gm^{−3}) and RF (RMSE 1.27 *μ*gm^{−3}). For PM_{10.0}, SVR-L leads to best results (both RMSE and MAE 0.06 *μ*gm^{−3}), followed by FFNN (RMSE 0.13 *μ*gm^{−3} and MAE 0.09 *μ*gm^{−3}). For PM_{2.5}/PM_{10.0}, FFNN is the best and accurate predictor (0.001 for both RMSE and MAE), followed by RF (0.02 for both RMSE and MAE) and SVR_L (RMSE 0.05 *μ*gm^{−3} and MAE 0.04).

Due to the geographical characteristics and climatic conditions, PM_{2.5} and PM_{10.0} pollutants frequently exceed the national and international air quality standards in Makkah region [5, 32]. These particles are very tiny, and their exposure is associated with adverse health effects. According to World Health Organization (WHO), reduction in annual PM_{10.0} concentration from 70 *μ*gm^{−3} to 20 *μ*gm^{−3} is associated with 15% reduction in deaths [3]. Exposure of these pollutants not only affects health of local community but also affects millions of pilgrims visiting Makkah annually. The current study can have implications to predict these pollutants to provide managerial solutions for the prevention and/or mitigating adverse health implications.

#### 4. Conclusion

The traditional PSR prediction method generally uses the data points of last embedding dimensions of PSR series (single scale) as the target set. APM mass concentrations are an outcome of complex natural and anthropogenic contributors evolving with time that may operate on multiple time scales. This study has proposed a novel PSR-based scientific solution that accounts for the information contained at multiple time scales. The optimal embedding dimension of PM_{2.5} is 5; for PM_{10.0} and PM_{2.5}/PM_{10.0} ratio, it is 6. The higher values of embedding dimensions reveal the chaotic behavior of both atmospheric particulates. Different machine learning algorithms are used to realize the prediction of APM mass concentrations using proposed and traditional PSR techniques. Performance of various learning algorithms is evaluated using RMSE and MAE. The results demonstrated that the proposed modification in PSR approach provided better prediction of APMs compared to traditional approach. The robust prediction is obtained using the FFNN learning model using the proposed modification in the PSR algorithm. The good prediction results indicate the usefulness of the proposed PSR approach and the suitability of the various machine learning approaches in combination for predicting atmospheric particulates mass concentrations. The proposed technique can be used for analyzing and prediction of interbeat interval time series, EEG time series, human gait dynamics, and financial time series data.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was partly funded by the King Abdulaziz City for Science and Technology (KACST) (grant no. 13-ENES2373-10)