#### Abstract

In order to solve the problems of strong coupling, nonlinearity, and complex mechanism in real-world engineering process, building soft sensor with excellent performance and robustness has become the core issue in industrial processes. In this paper, we propose a new soft sensor model based on improved Elman neural network (Elman NN) and introduce variable data preprocessing method to the soft sensor model. The improved Elman NN employs local feedback and feedforward network mechanism through context layer to accurately reflect the dynamic characteristics of the soft sensor model, which has the superiority to approximate delay systems and adaption of time-varying characteristics. The proposed variable data preprocessing method adopts combining Isometric Mapping (ISOMAP) with local linear embedding (LLE), which effectively maintains the neighborhood structure and the global mutual distance of dataset to remove the noises and data redundancy. The soft sensor model based on improved Elman NN with variable data preprocessing method by combining ISOMAP and LLE is applied in practical sintering and pelletizing to estimate the temperature in the rotary kiln calcining process. Comparing several conventional soft sensor model methods, the results indicate that the proposed method has more excellent generalization performance and robustness. Its model prediction accuracy and anti-interference ability have been improved, which provide an effective and promising method for the industrial process application.

#### 1. Introduction

Soft sensor is an inferential prediction virtual technique which adopts easily measured variables to predict the process variables which are difficult to measure directly because of technological and economical limitations or complex environment. The soft sensor tries to establish a regression prediction model between easily measured variables and difficultly measured variables, which is adopted to solve the problem that prevents the measurements from being employed as feedback signals for quality control systems. And the soft sensor methods have been employed more widely in the industrial process and become a major developing trend in both academia and industry [1].

In view of the model prediction in the industrial production process, early scholars put forward some model predictive control, such as generalized predictive control [2, 3], dynamic matrix predictive control [4], and model algorithm control [5]. However, these soft sensor prediction methods suffer from common shortcomings. One is that the prediction control of nonlinear systems cannot be effectively solved. Another is that the stability and robustness of multivariable predictive control algorithm need to be solved and it is very difficult to build accurate principle models for complex processes. At present, with the deep research of soft sensor control theory and the continuous progress of engineering technology, some artificial intelligence and machine learning methods based on data-driven technologies are proposed to solve the problems which are difficult to measure key process and quality variables for soft sensor models, such as artificial neural network (ANN) [6–8], rough set [9], partial least squares [10], support vector machine (SVM) [11–13], and hybrid methods [14, 15]. Some new works on advanced predictive control based on neural networks are referred to, for example, Ławryńczuk proposed MPC algorithms based on double-layer perceptron neural models [16]. And these data-driven soft sensors have been implemented successfully in many industrial application fields, like chemical and metallurgical engineering industries.

However, the complexity of the industrial model will increase and bring about the problem of dimensionality with variables increasing. In addition, there are strong correlations and considerable redundancy between variable data. And the collected variable data may be contaminated by random noises or abnormal working condition data, which indicates that the process variables for soft sensor are stochastic. If those variable data samples are used directly for establishing the soft sensor model, the performance of the soft sensor model may not be guaranteed, which results in terrible prediction accuracy or poor estimation. Hence, it is significant to discover the general trends of data by latent variable models before soft sensor modeling. And the randomness of variable sample data should be taken into consideration and the redundant variable sample data need to be discarded selectively through the use of appropriate variable selection techniques. The core is to exploit the essential information behind the effective sample data to establish a soft sensor model that has excellent performance and strong robustness. The shortcoming of variable sample data has advanced the development of the dimension reduction methods. So many data preprocessing methods for the soft sensor model have been researched in recent years. Sun et al. [17] presented the principal component analysis (PCA) method to analyze the correlation of input variables, which is effective to reduce the dimensions of input variables from the chemical oxygen demand prediction model based on LS-SVM. Najafi et al. [18] proposed a new method called “Path-Based Isomap” to find the low-dimensional embedding. The low-dimensional embedding is computed by exploiting geodesic path-mapping algorithm. Besides, some manifold learning methods have been proposed including Laplacian eigenmaps [19, 20], locally linear embedding [21], and Isometric Mapping method [22, 23]. However, these linear and nonlinear data preprocessing methods have their own limitations, which can only maintain a single feature of the variable dataset and cannot take into account other valuable characteristics. In order to overcome this problem, a data preprocessing method based on combining Isometric Mapping (ISOMAP) with local linear embedding (LLE) for input variable dataset is proposed. Hence, this paper establishes a soft sensor model based on improved Elman NN with input variable data preprocessing method which integrates ISOMAP with LLE under the kernel framework to estimate the quality variable and makes it have better performance and robustness.

The rest of this paper is organized as follows. Section 2 describes the improved Elman NN and data preprocessing method. Section 3 proposes a soft sensor model based on improved Elman NN with variable data selection method by combining ISOMAP with LLE and describes it in detail. In Section 4, an industrial case study on the prediction of temperature in a rotary kiln calcining process in pellet sintering is given. Finally, conclusions are made in Section 5.

#### 2. Background Theory

For convenience, some definitions that will be used throughout this paper are given: NN: neural network; ISOMAP: Isometric Mapping; LLE: local linear embedding.

##### 2.1. Improved Elman Neural Network

Elman neural network (Elman NN) [24–27] is a typical dynamic neural network with recurrent feedback layer based on BP neural network. The Elman NN stores the internal state through the context layer, which has the performance of mapping dynamic characteristics. The self-connected mode makes it sensitive to the value of the historical state and it has better ability to approximate nonlinear mapping than the BP NN. However, this feedback only aims at the hidden layer to adjust the connection weights of the internal feedback, which lacks the ability of global adjustment and dynamic stability. And the learning identification ability has some limitations.

To further improve the dynamic performance and nonlinear approximation capability of Elman NN, the output of context layer as a feedback signal should be fully taken advantage of. Figure 1 shows the architecture of the improved Elman NN that is consisted of an input layer, an output layer, a hidden layer, and a context layer which is connected to hidden layer neurons and output layer neurons.

From Figure 1 we can see that the improved Elman NN adds input signal from context layer nodes to the output layer, which are connected through a novel connection weight matrix between the context layer and output layer. After the hidden layer process, the output of the hidden layer returns to the context layer, then the intermediate state value of the context layer is transmitted to the output layer and hidden layer through the connection weight matrices and separately, which constitutes a feedforward and local feedback network. The feedback loop is formed in the hidden layer and context layer, and the path of feedforward is consisted of the input layer, hidden layer, context layer, and output layer, as shown in Figure 2.

We define the following symbols: the number of input layer nodes, output layer nodes, and hidden layer nodes is , , and , respectively. The network consisted of input data , intermediate state , and response output . The activation function of structure from the input layer to the hidden layer and from the hidden layer to the output layer is defined and , respectively.

The mathematical model in which the input layer is mapped to the hidden layer through the mapping activation function is
where represents the *sigmoid* function. Equation (1) is parameterized through , in which is the weight of input layer node to the hidden layer node ; is the weight of context layer node to the hidden layer node and is the bias vector of the hidden layer.

The hidden layer is mapped to the output space, is the output vector of , and the activation function is

Equation (2) is decided by , where is the weight of latent layer node to the output layer node ; is the connection weight of context layer node to the output layer node and represents the bias vector of the output layer.

In the training process, the output of the context layer is the sum of output of the hidden layer and times output of context layer at a previous time.

The loss function is defined as the squared error: where is the predicted output value at time ; is the actual output value at time .

A novel set of parameter vector of the improved Elman NN can be obtained and optimized to minimize the expected average squared error:

##### 2.2. Input Variable Data Selection and Dimension Reduction Method

Considering the original input variables of improved Elman NN exist with redundant variable data and abnormal condition variable data, a novel variable data selection methodology by integrating Isometric Mapping (ISOMAP) [28] and local linear embedding (LLE) [29, 30] is proposed to handle measurement uncertainty and determine the dimensionality of the latent variable. The purpose is to combine two manifold learning algorithms effectively under the kernel framework and maintain various characteristics of data set, which establishes a dimensionality reduction method based on manifold learning for high-dimensional nonlinear data sets to effectively solve the network topology and the complex training problems caused by the oversize input vector data. The novel methodology can be divided into three stages. At the first stage, we construct the Mercer kernel matrix of Isomap. Second, we extract the kernel matrix of LLE. At the third stage, we obtain the kernel matrix under the kernel framework.

The general steps of the data dimensionality reduction process are the following.

*Step 1. *Use neighborhood method or radius method to find neighborhood values.

*Step 2. *Use the shortest path method to calculate the geodesic distance and to form the geodesic distance matrix , in which . The matrix is constructed on the basis of approximate geodesic distance matrix .
where represents a centralization matrix, and .

*Step 3. *The matrix expression is constructed according to the matrix .

*Step 4. *Obtain the maximum eigenvalue of the matrix and construct the Mercer kernel matrix according to the maximum eigenvalue .
In order to ensure that is a real symmetric semidefinite matrix, we need to satisfy .

*Step 5. *For each data point of the LLE algorithm , solve its cost function .

The weight coefficient of data points is determined, and the weight matrix is constructed. At the same time, a real symmetric semipositive definite matrix is obtained based on the weight matrix , and the kernel matrix of the LLE algorithm is extracted.
where is the largest eigenvalue of the weight matrix .

*Step 6. *Obtain the kernel matrix of by the kernel matrix of the ISOMAP algorithm and the kernel matrix of the LLE algorithm. And the kernel matrix regulator is introduced to adjust the weight of the ISOMAP and LLE algorithms, and a new kernel matrix is constructed based on the kernel function.

*Step 7. *Calculate the eigenvalue matrix and eigenvector matrix of the new kernel matrix and obtain the low-dimensional embedded coordinates based on *MDS*:
where is the diagonal matrix of .

#### 3. Soft Sensor Model Based on Improved Elman NN with Dimension-Reduction

##### 3.1. Soft Sensor Model Based on Improved Elman NN with Variable Data Preprocessing

Soft sensor model based on improved Elman NN with variable data selection is aimed at exploiting the essential information behind the process data and filter redundant variable data in the soft sensor model. This paper proposes an input variable data selection and dimension-reduction methodology by integrating ISOMAP and LLE manifold learning algorithm, which is effective to solve redundant variable data and abnormal working data of the oversized sample data. This structure of the soft sensor model based on improved Elman NN with variable data selection is shown in Figure 3. The soft sensor model consisted of two parts: an input data selection (ISOMAP and LLE) and a supervised learning (improved Elman NN). In this paper, the variable screening method adopted the manifold learning variable selection method based on the fusion of LLE and ISOMAP under the kernel framework, which is mainly reflected in two aspects. (1) The initial input variables should be chosen reasonably to make the “quality” of initial input variables be ensured. (2) The useful information of input variables should not be prematurely eliminated. And adopting supervised criterion trains each layer to obtain the weights which are set as the initial weights instead of the random initial weights.

In Figure 3, , , , and are the weights of improved Elman NN, respectively, which are adopted as the weights of corresponding layers in the improved Elman NN optimized with respect to a supervised training criterion. The input data of the soft sensor model consisted of training sample data and discarded sample data: the discarded sample data , i.e., noise and redundant values of the primary input variables, and the training sample data (, ), i.e., pairs of input-output sample data without redundant and abnormal working data, are used to train improved Elman NN to have better accuracy for the soft sensor model.

##### 3.2. The Training Based on Supervised Learning

In the training process of improved Elman NN, each layer in the feedforward path is considered as one-layer BP network which can update weight parameters by gradient descent. The representation of th hidden layer is regarded as the input for the th context layer which feeds the previous time state of the hidden layer nodes back to the hidden layer. The error is adopted to refresh the weights.

The loss function can be the traditional squared error as in equation (4). Equation (5) can be reformulated by optimizing the parameters to minimize the average squared error. The gradient descent method is used to update parameters of the improved Elman NN and amend the weights , , , and to achieve self-adaptive learning. At time , we can get where , , is the signal of input layer nodes. , , is the current operation iterations. is the learning rate of the weights of . where is the learning rate of the weights of . where is the learning rate of the weights of . is obtained as in equation (3). where is the learning rate of the weights of .

A new set of weights can be updated by

Then, the prediction model of is obtained by the new weight parameters .

##### 3.3. Formulated the Criterion of Evaluating

After the variable selection and training, the prediction value of the soft sensor model based on improved Elman NN as in equation (18) needs be further evaluated. In this paper, the evaluation criterion, the Bayesian information criterion (BIC), is adopted as a measure between the complexity of the soft sensor model and the accuracy. where is the prediction value by modified weight parameters as in equation (18). is the actual value. represents the observational number of the soft sensor model. is the number of selected variables through integrating ISOMAP and LLE.

The optimal parameters can be obtained by the minimization of the average BIC values:

The procedure of the soft sensor model based on improved Elman NN with selected variable and dimension-reduction is summarized as follows.

*Step 1. Initialization. *Train a new improved Elman NN by the training dataset and obtain initial weight parameters of the network.

*Step 2. *Selection of input variable dataset.

*Step 2.1. *Initialize variable data preprocessing method based on combining ISOMAP with LLE.

*Step 2.2. *For the current variable dataset, calculate approximate geodesic distance to obtain the maximum eigenvalue of the matrix and construct the Mercer kernel matrix by equation (8).

*Step 2.3. *Obtain the real symmetric semipositive definite matrix according to cost function and extract the kernel matrix by equation (10).

*Step 2.4. *Calculate by equation (11) based on Step 2.2 and Step 2.3.

*Step 2.5. *Calculate by equation (12).

*Step 2.6. *Obtain the input variable data without redundancy and noises disturbing.

*Step 3. *Evaluate the parameter optimization by equation (19) and obtain the new parameters by equations (5) and (20).

*Step 4. *Update the weight parameters by equation (17) and obtain novel neural network by equation (18).

#### 4. Case Study: Application to Temperature Prediction in Rotary Kiln Calcining Process

The rotary kiln is an important part of pellet sintering. Green pellets with a diameter of 9~16 mm are generated in the pellet making system. Then the pellets pass through the kiln tail into the rotary kiln after preheating at the chain grate. In the rotary kiln, the oxidizing roast and consolidation process of pellets is completed, which makes pellets have the physical and chemical properties of the blast furnace burden. To ensure and improve the product quality of pellets, the real-time control of temperature in the rotary kiln calcining process is very important. And the temperature in the rotary kiln calcining process is directly related to production efficiency, energy consumption, and emissions of harmful gas.

By the analysis of the heat balance mechanism and technological characteristics of the rotary kiln calcining process, the temperature in the rotary kiln calcining process is coupled with multiple technological index variables in the grate system and ring cooler system, which makes it very difficult to use the instrument directly to measure the temperature in the rotary kiln calcining process because of poor working conditions or high maintenance costs of hardware sensors. Thus, the soft sensor model based on improved Elman NN with variable data selected and Bayesian optimization is applied to estimate the temperature in the rotary kiln calcining process. For the prediction of the temperature, we use the technological index variables which are correlated with prediction temperature and are easy to measure as the input variables of the soft sensor model in this process, which are listed in Table 1. Besides, a kind of physical and chemical reactions in the rotary kiln calcining process also affect the calcining temperature. A more detailed physico-chemical mechanism of the sintering process is described in literature [31, 32].

##### 4.1. The Soft Sensor Model of Temperature in Rotary Kiln Calcining Process Based on Improved Elman NN with Variable Data Preprocessing

According to the technological characteristics of the rotary kiln and technology analysis, the secondary variables are selected based on the combining of ISOMAP with LLE through kernel function, which can simplify the redundant and abnormal data of the selected variables. The secondary variables are chosen as input of the soft sensor model based on improved Elman NN to predict the temperature in the rotary kiln calcining process. The Bayesian optimization criterion is applied to estimate the accuracy of prediction model and adjust the weights in order to amend the error. The structure diagram of temperature prediction in the rotary kiln calcining process based on improved Elman NN with variable data preprocessing is shown in Figure 4.

##### 4.2. Results and Discussion

For the improved Elman NN training, data samples have been collected and are partitioned into two parts, in which 3000 process datasets of the secondary variables and corresponding temperature in the rotary kiln calcining process are selected for training, and the 1500 process datasets are applied as the testing dataset. At the same time, the data preprocessing method based on the combination of ISOMAP with LLE is used to remove the noise and data redundancy that exist in input data in order to improve prediction performance. Considering that SVM is a state-of-the-art soft sensor model with good generalization ability, the improved Elman NN with variable dataset preprocessing based on ISOMAP and LLE is compared with the soft sensor model based on SVM. The kernel function of SVM is the radial basis kernel function and the number of support vectors in the SVM is specified as 6. In addition, for a fair comparison, the number of hidden nodes and context nodes is the same in the Elman NN and improved Elman NN. And the number of hidden nodes is set to 36.

In the improved Elman NN with data preprocessing experiments, the gradient descent is employed to train the improved Elman NN. The datasets are chosen by the ISOMAP and LLE method. In addition, the parameters are optimized and decided by comparing validation error curves according to Bayesian criterion. Once the weight parameters are fine-tuned, all training data are employed to train the model again. And the learning rate according to the loss’s change is adopted to obtain an excellent performance. To test the prediction performance of the soft sensor model based on different model methods, 100 testing samples are shown in Figures 5–7 because of that the testing datasets are too large to show. To evaluate the performance of the soft sensor model, the root mean square error (*RMSE*), the maximum negative error (*MNE*), and the maximum positive error (*MPE*) are typically used. The values of the testing dataset calculated through the different soft sensor models are compared in Table 2.

**(a) Prediction curve of the kiln burning zone temperature**

**(b) Error curve of the kiln burning zone prediction temperature**

**(a) Prediction curve of the kiln burning zone temperature**

**(b) Error curve of the kiln burning zone prediction temperature**

**(a) Prediction curve of the kiln burning zone temperature**

**(b) Error curve of the kiln burning zone prediction temperature**

Table 2 shows that improved Elman NN with variable data preprocessing has more excellent learning performance than other soft sensor models. The learning ability of improved Elman NN is improved to some extent and the proposed variable data preprocessing method is effective to remove the redundant data and noises.

Figures 5 and 6 demonstrate that the improved Elman NN has good performance in predicting the temperature in the rotary kiln calcining process. The soft sensor model based on improved Elman NN has much better generalization and learning performance than the soft sensor model based on SVM and conventional Elman NN. The error curves of the SVM prediction, traditional Elman NN prediction, and improved Elman NN prediction which are shown in Figures 5(b) and 6(b) clearly indicate the difference between the different methods. It can be seen that the prediction error from the improved Elman NN is smaller than that from the SVM and conventional Elman NN.

However, the three methods are limited to handle the noises and data redundancy which exist in input variable data to improve the prediction performance. The soft sensor model based on improved Elman NN with ISOMAP and LLE is able to remove abnormal working condition noise and data redundancy, as shown in Figure 7. It is seen that when several abnormal noise data are added in the early stage, the improved Elman NN with ISOMAP and LLE can effectively avoid disturbance which is from noise. The improved Elman NN prediction is smooth without large fluctuations and shows better robustness than SVM prediction and improved Elman NN prediction.

#### 5. Conclusion

This paper develops a soft sensor model based on improved Elman NN and proposes a data preprocessing method based on ISOMAP and LLE for input variable data selection. The improved Elman NN introduces local feedback and feedforward networks into the Elman NN, which is trained to obtain better generalization and avoid overfitting to a certain extent. The data preprocessing method based on ISOMAP and LLE effectively keeps the neighborhood structure and the global mutual distance of the datasets and better reflects the real character of the original datasets, which is used to remove the abnormal working condition data, noises, and data redundancy in the input variables from improved Elman NN. The soft sensor model based on improved Elman NN with data preprocessing by ISOMAP and LLE is applied to estimate the temperature in the rotary kiln calcining process. The prediction values of the soft sensor model based on improved Elman NN with data preprocessing through ISOMAP and LLE follow the varying trend of the temperature very effectively and have excellent robustness and generalization performance. The great performance illustrates that the soft sensor model based on improved Elman NN with data preprocessing through ISOMAP and LLE provides a powerful and promising method for complex industrial process applications.

#### Data Availability

The prediction temperature of the rotary kiln calcining process in Sintering and Pelletizing data used to support the findings of this study have not been made available because the data which is used is the proprietary data from a Sintering and Pelletizing company that will share the data only with regulatory agencies but not with researchers. So the jurisdiction of the data is limited.

#### Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported in part by the Project by National Natural Science Foundation of China under Grant (61473054).