Advanced Techniques for Computational and Information SciencesView this Special Issue
Research Article | Open Access
Nonlinear Partial Least Squares for Consistency Analysis of Meteorological Data
Considering the different types of error and the nonlinearity of the meteorological measurement, this paper proposes a nonlinear partial least squares method for consistency analysis of meteorological data. For a meteorological element from one automated weather station, the proposed method builds the prediction model based on the corresponding meteorological elements of other surrounding automated weather stations to determine the abnormality of the measured values. For the proposed method, the latent variables of the independent variables and the dependent variables are extracted by the partial least squares (PLS), and then they are, respectively, used as the inputs and outputs of neural network to build the nonlinear internal model of PLS. The proposed method can deal with the limitation of traditional nonlinear PLS whose inner model is the fixed quadratic function or the spline function. Two typical neural networks are used in the proposed method, and they are the back propagation neural network and the adaptive neuro-fuzzy inference system (ANFIS). Moreover, the experiments are performed on the real data from the atmospheric observation equipment operation monitoring system of Shaanxi Province of China. The experimental results verify that the nonlinear PLS with the internal model of ANFIS has higher effectiveness and could realize the consistency analysis of meteorological data correctly.
Meteorological observation is the premise for the correctness of weather analysis, forecast, and severe weather warning . For many meteorological factors, there is not any consistent methodology used for the related measuring instruments. Therefore, the consistency analysis is of important practical motivation for the purposes of scientific research and resource management of the meteorological data .
Consistency analysis of meteorological data is based on the continuity and the uniformity of the distribution . For determining the abnormality of the meteorological data, the observation data of an automated weather station are compared with the prediction values, which are calculated according to the corresponding meteorological data of other surrounding automated weather stations . The consistency analysis methods of meteorological data include the spatial interpolation algorithm, the spatial regression test, the Madsen-Allerupt approach, and the climate statistical comparison method [5–9]. These methods consider that there exists the linear relationship for the space distribution of meteorological elements; namely, the measured values of a meteorological element of some automated weather stations, whose space location is near, have greater similarity. Since most of the meteorological measurement processes are nonlinear, the adequacy of these linear methods would be affected. A nonlinear autoregressive neural network is presented for consistency analysis of meteorological data and the approach has learning capacity of nonlinear dependencies from a large volume of potentially noisy data . Nevertheless, because of the black-box of the neural network, the understandable heuristic knowledge could not be provided . Partial least squares (PLS) is a classical regression method . PLS synthetically extracts the information for the independent variables and determines the latent variables which have the best interpretation capability. PLS has been widely used in many different domains [13–16]. The core idea of PLS is a kind of linear regression . PLS with nonlinear internal model, which uses a polynomial or spline nonlinear function as the internal model, is proposed to improve regression accuracy . The forms of the polynomial and spline function are restricted, and the neural network (NN) can approximate a nonlinear function with arbitrary precision . Hence, using NN to build the internal model of PLS would reduce the residual and have better effectiveness.
In this paper, a nonlinear partial least squares method for consistency analysis of meteorological data is proposed. For a meteorological element from one automated weather station, the proposed method builds the prediction model based on the corresponding meteorological elements of other surrounding automated weather stations to determine the abnormality of the measured values. For the proposed method, the latent variables of the independent variables and the dependent variables are extracted by PLS, and then they are, respectively, used as the inputs and outputs of neural network to build the nonlinear internal model of PLS. Two typical neural networks are used in the proposed method, and they are the back propagation neural network and the adaptive neuro-fuzzy inference system. Experiments are performed on the real data from the atmospheric observation equipment operation monitoring system of Shaanxi Province of China. The organization of this paper is as follows. The proposed method is presented in detail in Section 2. In Section 3, the experimental results are discussed. Finally, Section 4 concludes the paper.
2. The Proposed Method
For the consistency analysis of meteorological data, represents the prediction value of a meteorological element of an automated weather station, and represents the measured values of the corresponding meteorological element of surrounding automated weather stations. The flowchart of the proposed method is shown in Figure 1, where is the principal component of , is the principal component of is the residual matrix of , and is the residual matrix of .
The steps of the proposed method are explained as follows.
Step 1. Initialize , and , where is the th column vector of .
Step 2. Calculate , which is the weight vector of , and .
Step 3. Normalize , and .
Step 4. Calculate , and .
Step 5. Calculate , which is the weight vector of , and .
Step 6. Normalize , and .
Step 7. Calculate , and .
Step 8. If the change of at the th iteration is less than or equal to the threshold, then go to the next step; otherwise, go to Step 2.
Step 9. Calculate , which is the loading matrix of , and .
Step 10. Calculate , which is the loading matrix of , and .
Step 11. Use and as the inputs and outputs of NN to build the nonlinear internal model of PLS. For training process, the objective function is , where is the output of NN for a pair of and .
Step 12. Calculate the residual matrices for the th principal component, and where and .
Step 13. Let and and calculate the next principal component until the rank of is zero. Then, output the results.
Since the proposed method retains the external model of PLS and uses NN to build the internal model, it has the robust capability of PLS and the adaptive learning capability of NN. The data are mapped by the external characteristics of PLS and then used for training the NN; namely, the multivariate modeling is decomposed. Hence, for the proposed method, the related information from data is removed and the number of network independent weights is reduced. The noise-sensitiveness of NN and the local minima problem could be avoided .
For the proposed method, two typical neural networks are used for the internal model. They are the back propagation neural network (BPNN) and the adaptive neuro-fuzzy inference system (ANFIS). BPNN always includes three layers, which are the input layer, the hidden layer, and the output layer [21–23]. The transfer function of the hidden layer is usually a tangent s-type function, and the transfer function of the output layer is purelin function. For BPNN, a neuron is an activation function containing weights and bias parameters. The number of neurons in hidden layer is usually determined by the expert knowledge. Moreover, BPNN adopts a back propagation algorithm to train the parameters of activation functions. ANFIS is one of the most commonly used learning systems for the Tagaki-Sugeno fuzzy rule . ANFIS adopts the subtractive clustering algorithm to determine the initial rules and then uses a neural network with fixed five layers for tuning the rule parameters [25, 26]. In layer 1, each node function is the membership values of each input with respect to its linguistic term, and the standard Gaussian function is used as the membership functions. In layer 2, each node plays the role of a simple multiplier, and the output of each node represents the firing strength of the rule. In layer 3, each node calculates the ratio of the activation level of a rule to the total of all activation levels. In layer 4, each node calculates the contribution of the overall output; namely, it is simply the product of the normalized firing strength and the function of consequent, which is a first order polynomial. Layer 5 computes the overall output as the summation of all input, and it is the final output. In the next section, the experimental results will estimate the effectiveness of the proposed method.
3. Experimental Results
In the section, we perform BPNN, ANFIS, PLS, the nonlinear PLS with the internal model of BPNN (NPLSB), and the nonlinear PLS with the internal model of ANFIS (NPLSA) to build the prediction models for the atmospheric pressure and the air temperature. The measured values of the atmospheric pressure and the air temperature used in the experiments are obtained from five automated weather stations of Shaanxi Province of China. The five automated weather stations are the Xi’an station (XiA), the Lin Tong station (LinT), the Xian Yang station (XianY), the Lan Tian station (LanT), and the Jing Yang station (JingY). LinT, XianY, LanT, and JingY surround XiA and the locations of them are shown in Figure 2.
The automated weather station records the measured values of the atmospheric pressure and the air temperature per hour. The data recorded from March 31, 2013, 7:00 am, to April 5, 2013, 7:00 am, are used in the experiments. The data are divided into the training set and the test set. The data recorded from April 4, 2013, 8:00 am, to April 5, 2013, 7:00 am, are selected into the test set; namely, there are 24 continuous samples. The rest of the data are selected into the training set. The test set for the atmospheric pressure and the air temperature is listed in Table 1.
For the experiments, the meteorological elements of LinT, XianY, LanT, and JingY are the input variable and that of XiA is the output variable. For BPNN, the three-layer classical structure is adopted. Since the number of the input variables and the output variables is 4 and 1, respectively, the number of nodes of the input layer, the hidden layer, and the output layer of BPNN is 4, 5, and 1, respectively. The learning rate of BPNN is 0.01. For ANFIS, the number of the input variables and the output variables is 4 and 1. The initial step size is 0.01; the step size decrease rate is 0.7 and the step size increase rate 1.3. For the subtractive clustering algorithm in ANFIS, the radii value is 0.3. Moreover, for BPNN and ANFIS, the number of iterations is 100 and the training error goal is zero. For NPLSB, the number of the hidden layer nodes is still 5, and the other parameters are the same as those of BPNN. The related parameters of NPLSA are the same as those of ANFIS. For PLS, NPLSB, and NPLSA, the number of latent variables is determined according to the root-mean-squared error of leave-one-out cross validation (RMSECV), which iswhere and are the prediction values and the measured values of the cross validation set, respectively, and is the number of samples in the cross validation set.
To estimate the effectiveness of the several methods, RMSECV, the root-mean-squared error of prediction (RMSEP), the squared correlation coefficient of cross validation (), and the squared correlation coefficient of prediction () are adopted in the experiments.
RMSEP is defined aswhere and are the predicted values and the measured values of the validation set, respectively, and is the number of samples in the validation set.
is defined aswhere is the covariance operation and is the variance operation.
is defined as
In addition, the experiments are implemented in MATLAB 18.104.22.1682. The running environment is a general-purpose personal computer with an Intel i5-3570 CPU and 4 GB of RAM. The operating system was Microsoft Windows 7 Professional.
The experimental results of the atmospheric pressure are shown in Table 2. Although the RMSECV value of BPNN is the smallest and of BPNN is the largest, the RMSEP value of BPNN is larger than those of PLS, NPLSB, and NPLSA. The effectiveness of BPNN is worse. The RMSEP value of ANFIS is the largest, and the effectiveness of ANFIS is the worst. Since the RMSEP value of NPLSB is smaller than that of BPNN, integrating BPNN with PLS could improve the prediction capability of BPNN. However, the RMSECV value and the RMSEP value of PLS are smaller than those of NPLSB, and the internal model of NPLSB would be premature. The RMSEP value of NPLSA is the smallest, and NPLSA has the advantages of learning optimization ability of NN and the humanlike thinking of fuzzy logic technique. The experimental results verify that the effectiveness of NPLSA is the highest for the atmospheric pressure. Figure 3 shows the scatter diagrams of measured values versus predicted values of BPNN, ANFIS, PLS, NPLSB, and NPLSA for the atmospheric pressure. For BPNN and ANFIS, a few samples are far away from the diagonal line. The points of PLS and NPLSB are mainly distributed on both sides of the diagonal line. Almost all of the points of NPLSA are in the diagonal line, and the prediction capability of NPLSA is better.
The experimental results for the air temperature are summarized in Table 3. For the RMSECV value and the RMSEP value of ANFIS being the largest, the effectiveness of ANFIS is the worst. Although the RMSECV value of BPNN is the smallest, the RMSEP value is still larger. The RMSEP value of NPLSB is smaller than that of BPNN and is larger than that of PLS. Hence, using BPNN to build the internal model of PLS may enhance the prediction capability to a certain extent. The RMSECV value and the RMSEP value of NPLSA are both the smallest. Therefore, the NPLSA model is still more accurate. Figure 4 shows the scatter diagrams of measured values versus predicted values of BPNN, ANFIS, PLS, NPLSB, and NPLSA for the air temperature. Almost all of the points of BPNN and ANFIS are far away from the diagonal line. For NPLSB, a few pieces of data are far away from the diagonal line. The points of PLS and NPLSA are mainly distributed on both sides of the diagonal line. Moreover, the points of NPLSA are closer than those of PLS, and the prediction capability of NPLSA is still better.
In summary, the experimental results verify that NPLSA has higher predictive capability for the atmospheric pressure and the air temperature. Using the method for consistency analysis of meteorological data, if the measured values deviating from the prediction values exceed a preset threshold, the measured values would be labeled as abnormality. Hence, NPLSA could realize the consistency analysis of meteorological data correctly.
In the paper, a nonlinear partial least squares method for consistency analysis of meteorological data is proposed. The proposed model has some advantages as follows. First, the proposed method could realize the consistency analysis of meteorological data. Second, the proposed method integrates the robust capability of PLS and the adaptive learning capability of NN to predict the meteorological data correctly. Third, the multivariate modeling is decomposed with the data being mapped by the external characteristics and used for training the NN. Fourth, the noise-sensitive and the local minima problem of NN could be avoided to a certain extent. BPNN and ANFIS are used for building the internal model of PLS. The experimental results also verify the effectiveness of NPLSA is high. Since the performance of the proposed method may be affected by the training process, in the future research work, some advanced schemes would be adopted to further improve the training process.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by Technology Innovation Foundation of Shaanxi Meteorological Bureau (2015 M-53) and Locomotive Plan of Atmospheric Observation Technical Support Center of Shaanxi Province.
- R. Kuśmierek-Tomaszewska, J. zarski, and S. Dudek, “Meteorological automated weather station data application for plant water requirements estimation,” Computers and Electronics in Agriculture, vol. 88, pp. 44–51, 2012.
- J. Estévez, P. Gavilán, and J. V. Giráldez, “Guidelines on validation procedures for meteorological data from automatic weather stations,” Journal of Hydrology, vol. 402, no. 1-2, pp. 144–154, 2011.
- R. Steinacker, D. Mayer, and A. Steiner, “Data quality control based on self-consistency,” Monthly Weather Review, vol. 139, no. 12, pp. 3974–3991, 2011.
- M. Williams, D. Cornford, L. Bastin, R. Jones, and S. Parker, “Automatic processing, quality assurance and serving of real-time weather data,” Computers & Geosciences, vol. 37, no. 3, pp. 353–362, 2011.
- Z. He, X. Feng, L. He, and H. Wang, “Examination of meteorological data by the horizontal space consistency analysis from four directions,” Meteorological Monthly, vol. 36, no. 5, pp. 118–122, 2010.
- Z. Klausner and E. Fattal, “An objective and automatic method for identification of pattern changes in wind direction time series,” International Journal of Climatology, vol. 31, no. 5, pp. 783–790, 2011.
- W. Haijun and L. Ying, “Comprehensive consistency method of data quality controlling with its application to daily temperature,” Journal of Applied Meteorological Science, vol. 23, no. 1, pp. 69–76, 2012.
- G. Sciuto, B. Bonaccorso, A. Cancelliere, and G. Rossi, “Probabilistic quality control of daily temperature data,” International Journal of Climatology, vol. 33, no. 5, pp. 1211–1227, 2013.
- L. Li, C.-Y. Xu, Z. Zhang, and S. K. Jain, “Validation of a new meteorological forcing data in analysis of spatial and temporal variability of precipitation in India,” Stochastic Environmental Research and Risk Assessment, vol. 28, no. 2, pp. 239–252, 2014.
- M. López-Lineros, J. Estévez, J. V. Giráldez, and A. Madueño, “A new quality control procedure based on non-linear autoregressive neural network for validating raw river stage data,” Journal of Hydrology, vol. 510, pp. 103–109, 2014.
- M. C. Valverde, E. Araujo, and H. C. Velho, “Neural network and fuzzy logic statistical downscaling of atmospheric circulation-type specific weather pattern for rainfall forecasting,” Applied Soft Computing Journal, vol. 22, pp. 681–694, 2014.
- P. Geladi and B. R. Kowalski, “Partial least-squares regression: a tutorial,” Analytica Chimica Acta, vol. 185, pp. 1–17, 1986.
- B. Li, Y. He, F. Guo, and L. Zuo, “A novel localization algorithm based on isomap and partial least squares for wireless sensor networks,” IEEE Transactions on Instrumentation and Measurement, vol. 62, no. 2, pp. 304–314, 2013.
- J. Kuligowski, D. Pérez-Guaita, J. Escobar et al., “Evaluation of the effect of chance correlations on variable selection using partial least squares-discriminant analysiss,” Talanta, vol. 116, pp. 835–840, 2013.
- A. L. Tomren and T. Barth, “Comparison of partial least squares calibration models of viscosity, acid number and asphaltene content in petroleum, based on GC and IR data,” Fuel, vol. 120, pp. 8–21, 2014.
- B. Zhong, X. Yuan, R. Ji et al., “Structured partial least squares for simultaneous object tracking and segmentation,” Neurocomputing, vol. 133, pp. 317–327, 2014.
- H. Cao, X. Yan, Y. Li, Y. Wang, Y. Zhou, and S. Yang, “A component prediction method for flue gas of natural gas combustion based on nonlinear partial least squares method,” The Scientific World Journal, vol. 2014, Article ID 418674, 5 pages, 2014.
- C. Zhong and Z. Li-Qing, “Spectral reconstruction and quantitative analysis by b-spline transformations and penalized partial least squares approach,” Chinese Journal of Analytical Chemistry, vol. 37, no. 12, pp. 1820–1824, 2009.
- S. Lee and W. S. Choi, “A multi-industry bankruptcy prediction model using back-propagation neural network and multivariate discriminant analysis,” Expert Systems with Applications, vol. 40, no. 8, pp. 2941–2946, 2013.
- J. Cheng, H. Zhu, Y. Ding, S. Zhong, and Q. Zhong, “Stochastic finite-time boundedness for Markovian jumping neural networks with time-varying delays,” Applied Mathematics and Computation, vol. 242, pp. 281–295, 2014.
- M. J. Diamantopoulou, P. E. Georgiou, and D. M. Papamichail, “Performance evaluation of artificial neural networks in estimating reference evapotranspiration with minimal meteorological data,” Global Nest Journal, vol. 13, no. 1, pp. 18–27, 2011.
- G.-Z. Ma, E. Song, C.-C. Hung, L. Su, and D.-S. Huang, “Multiple costs based decision making with back-propagation neural networks,” Decision Support Systems, vol. 52, no. 3, pp. 657–663, 2012.
- S. Tengeleng and A. Nzeukou, “Performance of using cascade forward back propagation neural networks for estimating rain parameters with rain drop size distribution,” Atmosphere, vol. 5, no. 2, pp. 454–472, 2014.
- J.-S. R. Jang, “ANFIS: adaptive-network-based fuzzy inference system,” IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp. 665–685, 1993.
- S. M. Rahman, A. N. Khondaker, and R. A. Khan, “Ozone levels in the empty quarter of Saudi Arabia-application of adaptive neuro-fuzzy model,” Environmental Science and Pollution Research, vol. 20, no. 5, pp. 3395–3404, 2013.
- M. K. Goyal, B. Bharti, J. Quilty, J. Adamowski, and A. Pandey, “Modeling of daily pan evaporation in sub tropical climates using ANN, LS-SVR, fuzzy logic, and ANFIS,” Expert Systems with Applications, vol. 41, no. 11, pp. 5267–5276, 2014.
Copyright © 2015 Zhen Meng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.