Research Article  Open Access
A DBNBased Deep Neural Network Model with Multitask Learning for Online Air Quality Prediction
Abstract
To avoid the adverse effects of severe air pollution on human health, we need accurate realtime air quality prediction. In this paper, for the purpose of improve prediction accuracy of air pollutant concentration, a deep neural network model with multitask learning (MTLDBNDNN), pretrained by a deep belief network (DBN), is proposed for forecasting of nonlinear systems and tested on the forecast of air quality time series. MTLDBNDNN model can solve several related prediction tasks at the same time by using shared information contained in the training data of different tasks. In the model, DBN is used to learn feature representations. Each unit in the output layer is connected to only a subset of units in the last hidden layer of DBN. Such connection effectively avoids the problem that fully connected networks need to juggle the learning of each task while being trained, so that the trained networks cannot get optimal prediction accuracy for each task. The sliding window is used to take the recent data to dynamically adjust the parameters of the MTLDBNDNN model. The MTLDBNDNN model is evaluated with a dataset from Microsoft Research. Comparison with multiple baseline models shows that the proposed MTLDBNDNN achieve stateofart performance on air pollutant concentration forecasting.
1. Introduction
Air pollution is becoming increasingly serious. To protect human health and the environment, accurate realtime air quality prediction is sorely needed.
There are nonlinear and complex interactions among variables of air quality prediction data. Artificial neural networks can be used as a nonlinear system to express complex nonlinear maps, so they have been frequently applied to realtime air quality forecasting (e.g., [1–5]).
Deep networks have significantly greater representational power than shallow networks [6]. To solve several difficulties of training deep networks, Hinton et al. proposed a deep belief network (DBN) in [7]. DBN is trained via greedy layerwise training method and automatically extracts deep hierarchical abstract feature representations of the input data [8, 9]. Deep belief networks can be used for time series forecasting, (e.g., [10–15]). For these reasons, in this paper, the proposed prediction model is based on a deep neural network pretrained by a deep belief network.
Multitask learning can improve learning for one task by using the information contained in the training data of other related tasks [16]. Multitask deep neural network has already been applied successfully to solve many real problems, such as multilabel learning [17], compound selectivity prediction [18], traffic flow prediction [19], speech recognition [20], categorical emotion recognition [21], and natural language processing [22]. Collobert and Weston demonstrated that a unified neural network architecture, trained jointly on related tasks, provides more accurate prediction results than a network trained only on a single task [22].
Current air quality prediction studies mainly focus on one kind of air pollutants and perform single task forecasting. The most studied problem is the concentration prediction. However, there are correlations between some air pollutants predicted by us so that there is a certain relevance between different prediction tasks. For example, SO_{2} and NO_{2} are related, because they may come from the same pollution sources. Studies have showed that sulfate () is a major PM constituent in the atmosphere [23]. And in 2016, a discovery revealed that the aqueous oxidation of SO_{2} by NO_{2}, under specific atmospheric conditions, is key to efficient sulfate formation, and the chemical reaction led to the 1952 London “Killer” Fog [24]. And a study published in the US journal Science Advances also discovered that fine water particles in the air acted as a reactor, trapping sulfur dioxide (SO_{2}) molecules and interacting with nitrogen dioxide (NO_{2}) to form sulfate [25]. Therefore, we can regard the concentration forecasting of these three kinds of pollutants (, SO_{2}, and NO_{2}) as related tasks. Figure 1 shows some of the historical monitoring data for the concentrations of the three kinds of pollutants in a target station (Dongcheng Dongsi: airqualitymonitorstation) selected in this study. The three kinds of pollutants show almost the same concentration trend. Therefore, the concentration forecasting of the three kinds of pollutants can indeed be regarded as related tasks.
In this paper, based on the powerful representational ability of DBN and the advantage of multitask learning to allow knowledge transfer, a deep neural network model with multitask learning capabilities (MTLDBNDNN), pretrained by a deep belief network (DBN), is proposed for forecasting of nonlinear systems and tested on the forecast of air quality time series. DBN is used to learn feature representations, and several related tasks are solved simultaneously by using shared representations.
For multitask learning, a deep neural network with local connections is used in the study. Such connection effectively avoids the problem that fully connected networks need to juggle the learning of each task while being trained so that the trained networks cannot get optimal prediction accuracy for each task. The locally connected architecture can well learn the commonalities and differences of multiple tasks.
In order to get a better prediction of future concentrations, the sliding window [26, 27] is used to take the recent data to dynamically adjust the parameters of prediction model.
The rest of the paper is organized as follows. Section 2 presents the background knowledge of multitask learning, deep belief networks, and DBNDNN and describes DBNDNN model with multitask learning (MTLDBNDNN). In Section 3, the proposed model MTLDBNDNN is applied to the case study of the realtime forecasting of air pollutant concentration, and the results and analysis are shown. Finally, in Section 4, the conclusions on the paper are presented.
2. Methods
2.1. MultiTask Learning
Multitask learning can improve learning for one task by using the information contained in the training data of other related tasks. Multitask learning learns tasks in parallel and “what is learned for each task can help other tasks be learned better” [16].
Several related problems are solved at the same time by using a shared representation. Related learning tasks can share the information contained in their input data sets to a certain extent. Multitask learning exploits commonalities among different learning tasks. Such exploitation allows knowledge transfer among different learning tasks. The difference between the neural network with multitask learning capabilities and the simple neural network with multiple output level lies in the following: in multitask case, input feature vector is made up of the features of each task and hidden layers are shared by multiple tasks. Multitask learning is often adopted when training data is very limited for the target task domain [28].
2.2. Deep Belief Networks and DBNDNN
Deep Belief Networks (DBNs) [29] are probabilistic generative models, and they are stacked by many layers of Restricted Boltzmann Machines (RBMs), each of which contains a layer of visible units and a layer of hidden units. DBN can be trained to extract a deep hierarchical representation of the input data using greedy layerwise procedures. After a layer of RBM has been trained, the representations of the previous hidden layer are used as inputs for the next hidden layer. A schematic representation of a DBN is shown in Figure 2.
A DBN with hidden layers contains weight matrices: . It also contains bias vectors: with providing the biases for the visible layer. The probability distribution represented by the DBN is given byIn the case of realvalued visible units, substitutewith diagonal for tractability [30]. .
The weights from the trained DBN can be used as the initialized weights of a DNN [8, 30], and, then, all of the weights are finetuned by applying backpropagation or other discriminative algorithms to improve the performance of the whole network. When DBN is used to initialize the parameters of a DNN, the resulting network is called DBNDNN [31].
2.3. DBNBased Deep Neural Network Model with MultiTask Learning (MTLDBNDNN)
In this section, a DBNbased multitask deep neural network prediction model is proposed to solve multiple related tasks simultaneously by using shared information contained in the training data of different tasks.
DBNDNN prediction model with multitask learning is constructed by a DBN and an output layer with multiple units. Deep belief network is used to extract better feature representations, and several related tasks are solved simultaneously by using shared representations. The sigmoid function is used as the activation function of the output layer.
Each unit in the output layer is connected to only a subset of units in the last hidden layer of DBN. It is assumed that the number of related tasks to be processed is N, and it is assumed that the size of the subset (that is, the ratio of the number of nodes in the subset to the number of nodes in the entire last hidden layer) is α, then 1/(N1) > α > 1/N. At the locally connected layer, each output node has a portion of hidden nodes that are only connected to it, and it is assumed that the number of nodes in this part is β, then 0 < β < 1/N. There are common units with a specified quantity between two adjacent subsets.
The MTLDBNDNN model is learned with unsupervised DBN pretraining followed by backpropagation finetuning. The architecture of the model MTLDBNDNN is shown in Figure 3.
Remark. First, pretraining and finetuning ensure that the information in the weights comes from modeling the input data [32]. In other words, the network memorizes the information of the training data via the weights. The network needs not only to learn the commonalities of multiple tasks but also to learn the differences of multiple tasks. Locally connected network allows a subset of hidden units to be unique to one of the tasks, and unique units can better model the taskspecific information. Therefore, fully connected networks do not learn the information contained in the training data of multiple tasks better than locally connected networks. Second, fully connected networks need to juggle (i.e., balance) the learning of each task while being trained, so that the trained networks cannot get optimal prediction accuracy for each task. Based on the above two reasons, the last (fully connected) layer is replaced by a locally connected layer, and each unit in the output layer is connected to only a subset of units in the previous layer. There are common units with a specified quantity between two adjacent subsets.
Input. As long as a feature is statistically relevant to one of the tasks, the feature is used as an input variable to the model.
When the MTLDBNDNN model is used for time series forecasting, the parameters of model can be dynamically adjusted according to the recent monitoring data taken by the sliding window to achieve online forecasting.
The Setting of the Structures and Parameters. The architecture and parameters of the MTLDBNDNN can be set according to the practical guide for training RBMs in technical report [33].
3. Experiments
3.1. Data Set
In this study, we used a data set that was collected in (Urban Computing Team, Microsoft Research) Urban Air project over a period of one year (from 1 May 2014 to 30 April 2015) [34]. There are missing values in the data, so the data was preprocessed in this study. We chose Dongcheng Dongsi airqualitymonitorstation, located in Beijing, as a target station. The hourly concentrations of , NO_{2}, and SO_{2} at the station were predicted 12 hours in advance.
3.2. Feature Set
According to some research results, we let the factors that may be relevant to the concentration forecasting of three kinds of air pollutants make up a set of candidate features.
Traffic emission is one of the sources of air pollutants. The traffic flow on weekdays and weekend is different. During the morning peak hours and the afternoon rush hours, traffic density is notably increased. In this paper, the hour of day and the day of week were used to represent the traffic flow data that is not easy to obtain.
Anthropogenic activities that lead to air pollution are different at different times of a year. The day of year (DAY) [3] was used as a representation of the different times of a year, and it is calculated by where represents the ordinal number of the day in the year and T is the number of days in this year.
Regional transport of atmospheric pollutants may be an important factor that affects the concentrations of air pollutants. Three transport corridors are tracked by 24 h backward trajectories of air masses in JingJinJi area [3, 35], and they are presented in Figure 4. According to the current wind direction and the transport corridors of air masses, we selected a nearby city located in the upwind direction of Beijing. Then we used the monitoring data of the concentrations of six kinds of air pollutants from a station located in the city to represent the current pollutant concentrations of the selected nearby city.
(a)
(b)
(c)
Candidate features include meteorological data from the target station whose three kinds of air pollutant concentrations will be predicted (including weather, temperature, pressure, humidity, wind speed, and wind direction) and the concentrations of six kinds of air pollutants at the present moment from the target station and the selected nearby city (including , PM_{10}, SO_{2}, NO_{2}, CO, and O_{3}), the hour of day, the day of week, and the day of year. Weather has 17 different conditions, and they are sunny, cloudy, overcast, rainy, sprinkle, moderate rain, heaver rain, rain storm, thunder storm, freezing rain, snowy, light snow, moderate snow, heavy snow, foggy, sand storm, and dusty. All feature numbers are presented in the Table 1.

3.3. Evaluation Metrics
In this study, four performance indicators, including Mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE), and Accuracy (Acc) [34], were used to assess the performance of the models. They are defined bywhere N is the number of time points and and represent the observed and predicted values respectively.
3.4. Experiment Setup
There is a new data element arriving each hour. Each data element together with the features that determine the element constitute a training sample , where , , and represent concentration, NO_{2} concentration and SO_{2} concentration, respectively. is a set of features, and the set is made up of the factors that may be relevant to the concentration forecasting of three kinds of pollutant.
Setting the Parameters of Sliding Window (Window Size, Step Size, Horizon). In the study, the concentrations of , NO_{2}, and SO_{2} were predicted 12 hours in advance, so, horizon was set to 12. Window size was equal to 1220; that is, the sliding window always contained 1220 elements. Step size was set to 1. After the current concentration was monitored, the sliding window moved onestep forward, the prediction model was trained with 1220 training samples corresponding to the elements contained in the sliding window, and then the welltrained model was used to predict the responses of the target instances.
Selecting Features Relevant to Each Task. The experimental procedures are as follows:
(1) After the continuous variables are discretized, for different tasks, the features were evaluated and sorted according to minimalredundancymaximalrelevance (mRMR) criterion.
First, the continuous variables were discretized, and the discretized response variable became a class label with numerical significance. In this paper, continuous variables were divided into 20 levels. A MI Tool box, a mutual information package of Adam Pocock, was used to evaluate the importance of the features according to the mRMR criterion.
(2) The dataset was divided into training set and test set. For each task, we used random forest to test the feature subsets from top1topn according to the feature importance ranking, and then selected the first n features corresponding to the minimum value of the MAE as the optimal feature subset. The curves of MAE are depicted in Figure 5. Table 2 shows the selected features relevant to each task.

In order to verify whether the application of multitask learning and online forecasting can improve the DBNDNN forecasting accuracy, respectively, and assess the capability of the proposed MTLDBNDNN to predict air pollutant concentration, we compared the proposed MTLDBNDNN model with four baseline models (25):
(1) DBNDNN model with multitask learning using online forecasting method (OLMTLDBNDNN).
(2) DBNDNN model using online forecasting method (OLDBNDNN).
(3) DBNDNN model.
(4) AirQualityPredictionHackathonWinningModel (WinningModel) [36].
(5) A hybrid predictive model (FFA) proposed by Yu Zheng, etc. [34].
For the single task prediction model, the input of the model is the selected features relevant to single task. For the multitask prediction model, as long as a feature is relevant to one of the tasks, the feature is used as an input variable to the model.
Remark. For the first two models (MTLDBNDNN and DBNDNN), we used the online forecasting method. To be distinguished from static forecasting models, the models using online forecasting method were denoted by OLMTLDBNDNN and OLDBNDNN, respectively.
For the first three models above, we used the same DBN architecture and parameters. According to the practical guide for training RBMs in technical report [33] and the dataset used in the study, we set the architecture and parameters of the deep neural network as follows. In this study, deep neural network consisted of a DBN with layers of size G10010010090 and a top output layer, and G is the number of input variables. The DBN was constructed by stacking four RBMs, and a GaussianBernoulli RBM was used as the first layer. In the pretraining stage, the learning rate was set to 0.00001, and the number of training epochs was set to 50. In the finetuning stage, we used 10 iterations, and grid search was used to find a suitable learning rate. For the OLMTLDBNDNN model, the output layer contained three units and simultaneously output the predicted concentrations of three kinds of pollutants. Each unit at output layer was connected to only a subset of units at the last hidden layer of DBN.
For WinningModel, time back was set to 4. Since the dataset used in this study was released by the authors of [34], the experimental results given in the original paper for the FFA model were quoted for comparison.
Because the first two models above are the models that use online forecasting method, the training set changes over time. For the sake of fair comparison, we selected original 1220 elements contained in the window before sliding window begins to slide forward, and used samples corresponding to these elements as the training samples of the static prediction models (DBNDNN and WinningModel). The four models were used to predict the concentrations of three kinds of pollutants in the same period. The experimental results of hourly concentration forecasting for a 12h horizon are shown in Table 3, where the best results are marked with italic.

3.5. Results and Discussions
Table 3 shows that the best results are obtained by using OLMTLDBNDNN method for concentration forecasting. Three error evaluation criteria (MAE, RMSE, and MAPE) of the OLMTLDBNDNN are lower than that of the baseline models, and its accuracy is significantly higher than that of the baseline models. The prediction performance of OLDBNDNN is better than DBNDNN, which shows that the use of online forecasting method can improve the prediction performance. The performance of OLMTLDBNDNN surpasses the performance of OLDBNDNN, which shows that multitask learning is an effective approach to improve the forecasting accuracy of air pollutant concentration and demonstrates that it is necessary to share the information contained in the training data of three prediction tasks. It is worth mentioning that learning tasks in parallel to get the forecast results is more efficient than training a model separately for each task.
The experimental results show that the OLMTLDBNDNN model proposed in this paper achieves better prediction performances than the AirQualityPredictionHackathonWinningModel and FFA model, and the prediction accuracy is greatly improved. For example, when we predict concentrations, compared with WinningModel, MAE and RMSE of OLMTLDBNDNN are reduced by about 5.11 and 4.34, respectively, and accuracy of OLMTLDBNDNN is improved by about 13%. These positive results demonstrate that our model MTLDBNDNN is promising in realtime air pollutant concentration forecasting.
When the prediction time interval in advance is set to 12 hours, some prediction results of three models are presented in Figure 6.
(a) OLMTLDBNDNN
(b) OLDBNDNN
(c) WinningModel
Figure 6 shows that predicted concentrations and observed concentrations can match very well when the OLMTLDBNDNN is used. The advantage of the OLMTLDBNDNN is more obvious when OLMTLDBNDNN is used to predict the sudden changes of concentrations and the high peaks of concentrations.
4. Conclusion
In this paper, a deep neural network model with multitask learning (MTLDBNDNN), pretrained by a deep belief network (DBN), is proposed for forecasting of nonlinear systems and tested on the forecast of air quality time series.
The MTLDBNDNN model can fulfill prediction tasks at the same time by using shared information. In the model, each unit in the output layer is connected to only a subset of units in the last hidden layer of DBN. There are common units with a specified quantity between two adjacent subsets. Such connection effectively avoids the problem that fully connected networks need to juggle the learning of each task while being trained, so that the trained networks cannot get optimal prediction accuracy for each task. The locally connected architecture can well learn the commonalities and differences of multiple tasks.
, SO_{2}, and NO_{2} have chemical reaction and almost the same concentration trend, so we apply the proposed model to the case study on the concentration forecasting of three kinds of air pollutants 12 hours in advance. Comparison with multiple baseline models shows our model MTLDBNDNN has a stronger capability of predicting air pollutant concentration. Therefore, by combining the advantages of deep learning, multitask learning and online forecasting, the MTLDBNDNN model is able to provide accurate realtime concentration predictions of air pollutants.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Additional Points
Section 3.2 of this paper (feature set) cites the author’s conference paper [37].
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by National Natural Science Foundation of China (61873008) and Beijing Municipal Natural Science Foundation (4182008).
References
 P. S. G. De Mattos Neto, F. Madeiro, T. A. E. Ferreira, and G. D. C. Cavalcanti, “Hybrid intelligent system for air quality forecasting using phase adjustment,” Engineering Applications of Artificial Intelligence, vol. 32, pp. 185–191, 2014. View at: Publisher Site  Google Scholar
 K. Siwek and S. Osowski, “Improving the accuracy of prediction of PM_{10} pollution by the wavelet transformation and an ensemble of neural predictors,” Engineering Applications of Artificial Intelligence, vol. 25, no. 6, pp. 1246–1258, 2012. View at: Publisher Site  Google Scholar
 X. Feng, Q. Li, Y. Zhu, J. Hou, L. Jin, and J. Wang, “Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation,” Atmospheric Environment, vol. 107, pp. 118–128, 2015. View at: Publisher Site  Google Scholar
 W. Tamas, G. Notton, C. Paoli, M.L. Nivet, and C. Voyant, “Hybridization of air quality forecasting models using machine learning and clustering: An original approach to detect pollutant peaks,” Aerosol and Air Quality Research, vol. 16, no. 2, pp. 405–416, 2016. View at: Publisher Site  Google Scholar
 A. Kurt and A. B. Oktay, “Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks,” Expert Systems with Applications, vol. 37, no. 12, pp. 7986–7992, 2010. View at: Publisher Site  Google Scholar
 A. Y. Ng, J. Ngiam, C. Y. Foo, Y. Mai, and C. Suen, Deep Networks: Overview, 2013, http://deeplearning.stanford.edu/wiki/index.php/Deep_Networks:_Overview.
 G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. View at: Publisher Site  Google Scholar
 S. Azizi, F. Imani, B. Zhuang et al., “Ultrasoundbased detection of prostate cancer using automatic feature selection with deep belief networks,” in Medical Image Computing and ComputerAssisted Intervention  MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. Frangi, Eds., vol. 9350 of Lecture Notes in Computer Science, pp. 70–77, Springer, Munich, Germany, 2015. View at: Google Scholar
 M. Qin, Z. Li, and Z. Du, “Red tide time series forecasting by combining ARIMA and deep belief network,” KnowledgeBased Systems, vol. 125, pp. 39–52, 2017. View at: Publisher Site  Google Scholar
 X. Sun, T. Li, Q. Li, Y. Huang, and Y. Li, “Deep belief echostate network and its application to time series prediction,” KnowledgeBased Systems, vol. 130, pp. 17–29, 2017. View at: Publisher Site  Google Scholar
 T. Kuremoto, S. Kimura, K. Kobayashi, and M. Obayashi, “Time series forecasting using a deep belief network with restricted Boltzmann machines,” Neurocomputing, vol. 137, pp. 47–56, 2014. View at: Publisher Site  Google Scholar
 F. Shen, J. Chao, and J. Zhao, “Forecasting exchange rate using deep belief networks and conjugate gradient method,” Neurocomputing, vol. 167, pp. 243–253, 2015. View at: Publisher Site  Google Scholar
 A. Dedinec, S. Filiposka, A. Dedinec, and L. Kocarev, “Deep belief network based electricity load forecasting: An analysis of Macedonian case,” Energy, vol. 115, pp. 1688–1700, 2016. View at: Publisher Site  Google Scholar
 H. Z. Wang, G. B. Wang, G. Q. Li, J. C. Peng, and Y. T. Liu, “Deep belief network based deterministic and probabilistic wind speed forecasting approach,” Applied Energy, vol. 182, pp. 80–93, 2016. View at: Publisher Site  Google Scholar
 R. Caruana, “Multitask learning,” Machine Learning, vol. 28, no. 1, pp. 41–75, 1997. View at: Google Scholar
 Y. Huang, W. Wang, L. Wang, and T. Tan, “Multitask deep neural network for multilabel learning,” in Proceedings of the IEEE International Conference on Image Processing, pp. 2897–2900, Melbourne, Australia, 2013. View at: Google Scholar
 R. Zhang, J. Li, J. Lu, R. Hu, Y. Yuan, and Z. Zhao, “Using deep learning for compound selectivity prediction,” Current ComputerAided Drug Design, vol. 12, no. 1, pp. 5–14, 2016. View at: Publisher Site  Google Scholar
 W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for traffic flow prediction: deep belief networks with multitask learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 5, pp. 2191–2201, 2014. View at: Publisher Site  Google Scholar
 D. Chen and B. Mak, “Multitask learning of deep neural networks for lowresource speech recognition,” IEEE Transactions on Audio, Speech and Language, vol. 23, no. 7, pp. 1172–1183, 2015. View at: Google Scholar
 R. Xia and Y. Liu, “Leveraging valence and activation information via multitask learning for categorical emotion recognition,” in Proceedings of the 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, pp. 5301–5305, Brisbane, Australia, April 2014. View at: Google Scholar
 R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in Proceedings of the 25th International Conference on Machine Learning, pp. 160–167, Helsinki, Finland, July 2008. View at: Google Scholar
 R. M. Harrison, A. M. Jones, and R. G. Lawrence, “Major component composition of PM10 and PM2.5 from roadside and urban background sites,” Atmospheric Environment, vol. 38, no. 27, pp. 4531–4538, 2004. View at: Publisher Site  Google Scholar
 G. Wang, R. Zhang, M. E. Gomez et al., “Persistent sulfate formation from London Fog to Chinese haze,” Proceedings of the National Acadamy of Sciences of the United States of America, vol. 113, no. 48, pp. 13630–13635, 2016. View at: Publisher Site  Google Scholar
 Y. Cheng, G. Zheng, C. Wei et al., “Reactive nitrogen chemistry in aerosol water as a source of sulfate during haze events in China,” Science Advances, vol. 2, Article ID e1601530, 2016. View at: Publisher Site  Google Scholar
 D. Agrawal and A. E. Abbadi, “Supporting sliding window queries for continuous data streams,” in IEEE International Conference on Scientific and Statistical Database Management, pp. 85–94, Cambridge, Massachusetts, USA, 2003. View at: Google Scholar
 K. B. Shaban, A. Kadri, and E. Rezk, “Urban air pollution monitoring system with forecasting models,” IEEE Sensors Journal, vol. 16, no. 8, pp. 2598–2606, 2016. View at: Publisher Site  Google Scholar
 L. Deng and D. Yu, “Deep learning: methods and applications,” in Foundations and Trends® in Signal Processing, vol. 7, pp. 197–391, Now Publishers Inc, Hanover, MA, USA, 2014. View at: Publisher Site  Google Scholar  MathSciNet
 G. E. Hinton, “Deep belief networks,” Scholarpedia, vol. 4, no. 5, article no. 5947, 2009. View at: Publisher Site  Google Scholar
 Y. Bengio, I. Goodfellow, and A. Courville, Deep Generative Models, Deep Learning, MIT Press, Cambridge, Mass, USA, 2017. View at: MathSciNet
 G. Hinton, L. Deng, D. Yu et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. View at: Google Scholar
 G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” The American Association for the Advancement of Science: Science, vol. 313, no. 5786, pp. 504–507, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 G. Hinton, “A practical guide to training restricted Boltzmann machines,” in Neural Networks: Tricks of the Trade, G. Montavon, G. B. Orr, and K.R. Müller, Eds., vol. 7700 of Lecture Notes in Computer Science, pp. 599–619, Springer, Berlin, Germany, 2nd edition, 2012. View at: Publisher Site  Google Scholar
 Y. Zheng, X. Yi, M. Li et al., “Forecasting finegrained air quality based on big data,” in Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '15), pp. 2267–2276, Sydney, Australia, August 2015. View at: Publisher Site  Google Scholar
 X. Feng, Q. Li, Y. Zhu, J. Wang, H. Liang, and R. Xu, “Formation and dominant factors of haze pollution over Beijing and its peripheral areas in winter,” Atmospheric Pollution Research, vol. 5, no. 3, pp. 528–538, 2014. View at: Publisher Site  Google Scholar
 “Winning Code for the EMC Data Science Global Hackathon (Air Quality Prediction), 2012,” https://github.com/benhamner/AirQualityPredictionHackathonWinningModel. View at: Google Scholar
 J. Li, X. Shao, and H. Zhao, “An online method based on random forest for air pollutant concentration forecasting,” in Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 2018. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2019 Jiangeng Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.