Abstract

In this paper, a novel soft sensor is developed by combining long short-term memory (LSTM) network with normalized mutual information feature selection (NMIFS). In the proposed algorithm, LSTM is designed to handle time series with high nonlinearity and dynamics of industrial processes. NMIFS is conducted to perform the input variable selection for LSTM to simplify the excessive complexity of the model. The developed soft sensor combines the excellent dynamic modelling of LSTM and precise variable selection of NMIFS. Simulations on two actual production datasets are used to demonstrate the performance of the proposed algorithm. The developed soft sensor could precisely predict the objective variables and has better performance than other methods.

1. Introduction

Due to technological constraints, sensor characteristics, environmental factors, etc., many variables cannot be measured or the measurement frequency is very low in actual industrial processes. Soft measurement provides an excellent solution to construct mathematical models from easily measured variables to hard ones [13]. Neural networks (NNs) are advanced methods that can precisely model complex and nonlinear system and therefore have been widely used in soft sensors [46]. Heidari et al. [7] developed a new multi-layer perceptron (MLP) network to estimate nanofluid relative viscosity, which are more accurate than other NN structures. Sheela and Deepa [8] designed a synthesized model by combining self-organizing maps (SOMs) with MLP and then applied it to forecast the wind speed of a renewable energy process. He et al. [9] developed an auto-associative hierarchical NN for a soft sensor of chemical processes, and its application to a purified terephthalic acid solvent process demonstrated the effectiveness of the algorithm. Zabadaj et al. [10] proposed an effective soft sensor for the supervisory control of biotransformation production, and the efficiency of the approach was demonstrated. Rehrla et al. [11] developed a soft sensor method for estimating the active pharmaceutical ingredient concentration from the system data and the soft sensor model was tested in the three different continuous production lines. A novel approach of supervised latent factor analysis was proposed based on system data regression modelling, which can effectively predict heterogeneous variances, and soft sensors are established for quality estimate in the two case studies [12].

However, industrial systems are intrinsic complex and have high temporal correlations between dataset samples. That is, process data are time series with strong nonlinearities and dynamics, which increases the difficulty of modelling with conventional NNs. Recently, a powerful type of NN named long short-term memory (LSTM) was designed to handle sequence dependence [1315]. An LSTM network is more significant in learning long-term temporal dependencies since its memory cells can maintain its state over a long time and standardize the information moving into and out of the cell. Therefore, LSTM networks have been effectively used in many different fields, such as precipitation nowcasting [16], traffic forecasting [17], human action recognition [18], etc. Due to its advantages, LSTM has also been applied in soft sensor development in industrial processes. Yuan et al. developed a supervised LSTM network for a soft sensor and demonstrated the superiority of the proposed soft sensor by two actual industrial datasets [19]. Sun proposed a new LSTM network by combining unsupervised feature selection and supervised dynamic modelling methods for a soft sensor and validated the network by a practical CO2 absorption column [20].

The rapid evolution of distributed control systems (DCSs) presents us a lot of data, but also another trouble in nonlinear soft sensing: excessive input variables. If the NN was trained with excessive input variables, the amount of calculations will increase and more computing power is required. In the meantime, the prediction accuracy of the NN is worsened due to extraneous variables creating additional noise in the dataset. Hence, many researchers have focused on the efficiency of variable selection approaches for soft sensors [2123]. In recent years, mutual information (MI)-based variable selection approaches have been widely studied due to their efficacy and ease of realization [24]. Hanchuan et al. [25] proposed a minimal redundancy maximal correlation criterion to reduce redundancy and apply primitive methods of relevance and redundancy to select significant input variables. Estevez et al. [26] proposed an enhanced version of MIFS and minimal redundancy maximal correlation that imported the normalized MI as an evaluation of redundancy. The developed NMIFS algorithm showed better performance by compensating for the MI partial to multiple features and limiting its setpoint range [0, 1].

This paper develops a new soft sensor algorithm by combining LSTM with NMIFS, in which the NMIFS is used to compress input variables of LSTM. The primary contributions of the paper are summarized as follows:(1)A novel feature selection approach for LSTM with NMIFS is designed. The developed method can effectively reduce the excessive complexity caused by redundant candidate variables and then improve the modelling performance of LSTM.(2)The developed soft sensor algorithm is implemented in two practical industrial processes.(3)Comparative simulation results demonstrate that the developed soft sensor model has better performance and flexibility in performing feedback control.

This paper is arranged as follows: Section 2 presents background theories of the NMIFS and LSTM, and Section 3 describes the development of the presented approach. Section 4 presents the simulation results and an analysis of the developed soft sensor with datasets of actual processes. Some conclusions are given in Section 5.

2. Theoretical Overview

2.1. Input Variable Selection Techniques

The existence of redundant input variables in the training of the NN often complicates the model, deteriorates the accuracy, and even brings about overfitting. The goal of the input variable feature selection (IVFS) is to exactly select n variables from the initial candidate variable set as the input variable set in the modelling, where includes multifarious input variables of the algorithm. During variable selection, variables that have less influence on target variable will be deleted from .

The IVFS algorithm can be applied in a variety of means: (1) sequential forward selection, selecting an input from to join every time until the prediction accuracy of the model is no longer improved; (2) sequential backward selection, where initially includes all input variables and input variables are deleted one at a time until the model performance is no longer improved; or (3) global optimization, which finds the optimal solution among all the variable selection approaches.

It can be demonstrated that if there are m candidate variables in , the selection of the m input variables results in subsets in total. is hard to be found with large number of candidate variables. Based on this consideration, a statistical indicator to calculate the extent of dependence between input and output variables is selected, and then the input variable before modelling with NNs is selected. This method of separating variable selection procedures from model calibration procedures can produce a more efficient IVFS algorithm, and the resulting has wider applicability to different NN algorithms. It is worth noting that the effectiveness of the IVFS approach is based on the statistical standard applied.

MI is considered as an excellent evaluation standard because it is a random measure and does not make assumptions about the structure of the dependencies between variables. MI is also found to be impervious to data transformations and noise.

2.2. Normalized Mutual Information

Suppose that there are two random variables and , where is the input variable and the output variable depends on. The definition of MI of for a continuous variable can be shown as follows [27]:where is the joint probability density function (PDF) of two variables and and are the marginal PDFs of and . Figure 1 shows the entropy of and and its relationship to their MI, in which and are entropies, and and are conditional entropies, respectively.

Generally speaking, MI has three basic attributes:(1)Symmetry: . The quantity of information abstracted from about is equal to that from about . The only difference is the angle of the observer.(2)Positive: . Extracting information about one event from another, the worst case is zero information . Being aware of one event does not strengthen the uncertainty of another.(3)Extremum: , . The quantity of information abstracted from one case about another is at maximum same as the entropy of the other case, rather than exceeding the amount of information contained by the other event itself.

The MI provides dependencies for X and measurements and provides reference information for variable selection algorithms, which makes the computation of MI a crucial procedure in MI-based input variable selection approaches [28, 29]. However, the mathematical expression form of the PDF in equation (1) is unconscious in practical problems. A variety of approximate prediction algorithms of MI have been extensively researched to analyze PDFs. For example, kernel density estimation (KDE) is an advanced technique that superposes a basis function on each point of the feature data, usually a Gaussian function. The PDF approximation can then be obtained by adopting an envelope of all the basic functions superimposed on each point. Although these kinds of algorithms bring superior approximation results, the computation load is very high, especially in large-scale problems. Histogram methods provide another competitive method, with admissible precision and significantly more computational performance than KDE methods.

When MI is applied to practical cases, the calculation results fluctuate greatly, and it is difficult to directly compare the similarity between several variables and the target variables used as indicators [30]. This paper introduces a method to normalize MI. There are several methods of doing so. The general idea is to use entropy as the denominator to regulate the value of MI to between 0 and 1. One common implementation is the following formula:

Then, NMI can be used to evaluate the resemblance between candidate and target input variables.

2.3. Long Short-Term Memory

The LSTM network is applied to predicting target variables with relatively long intervals and postponements in the time series. The structure of neurons in LSTM is shown in Figure 2. It includes a cell state and three gate settings: the cell state is used to record neuron status, the input and output gates are used to receive and output parameters, respectively, and the forget gate is used to dominate the degree of forgetting of the previous unit state [31, 32].

The detailed structure and operation mechanism of LSTM are shown in Figure 3. The forgotten part of the memory unit is decided by the input in the forgetting gate together with the state memory unit and the intermediate output . The retention vector in the memory unit is determined by the changed in the input gate through the sigmoid and tanh functions. The intermediate output is determined by the updated and output . The calculation formula is as follows:where , , , and are the states of the forgetting gate, input gate, input node, output gate, intermediate output, and status unit, respectively; , , , , , , , and are the matrix weight multiplied by input of the corresponding gate and the intermediate output , respectively; , , , and are the biases of the corresponding gates; indicates that the elements in the vector are multiplied by bits; and and represent the transformation of the sigmoid and tanh function, respectively.

3. Development of NMIFS-LSTM

The evaluation function plays a pivotal role in the MI feature selection, which directly affects the final performance of the algorithm. The method of selecting the variable with the most MI of output variable and input variable is the most direct solution. The evaluation function is shown in

The MIFS [33] method introduces penalty terms based on the measure of relevance, which incorporates correlation and redundancy between variables. The evaluation function is shown inwhere is the selected feature subset, is the selected feature, and parameter controls the degree of penalty for redundant items.

In order to reduce the dependence on parameter , Kwak and Chong-Ho Choi [34] proposed the method of MIFS-U, and the evaluation function expression is exhibited as

Hanchuan et al. [25] enhanced MIFS and developed the minimal redundancy maximal relevance algorithm, which establishes a relationship between sample size and parameter β. The mean value of MI is used as the redundancy evaluation index to avoid the selection of parameter β. The evaluation function can be shown as

The standardized MI between variables was defined by Estevez et al. [26], and the NMIFS algorithm was proposed. Its evaluation function can be expressed aswhere the standardized MI is performed as equation (9). The regularized MI compensates for the bias of MI to multivalued variables, and the regularized MI value is strictly restricted to the interval of [0, 1].

In this paper, a novel variable selection method of NMIFS-LSTM is developed. This method combines NMIFS and LSTM, and after that, the root mean square error (RMSE) of the LSTM network is used as the evaluation standard. The proposed algorithm aims to eliminate redundant variables and improve model accuracy. The pseudocode of NMIFS-LSTM can be shown in Algorithm 1.

Input: dataset imprent MT shadow
Output: predicted value
Begin algorithm
 Initialize
 LSTM is trained to determine network hyperparameters and network structure;
 Set F = n; S = empty set (n = number of input variables);
 Computation of NMI with LSTM;
  For i = 1:j (j is frequency of the stop criterion)
   ∀ F compute I (L; );
   Find a first variable that maximizes I(L) and obtain RMSE;
   Set , set , set i = 1;
    Choose the next variable  = argmax, FS(minS(I(, ; L))) and obtain new RMSE;
   set ; set , j = j + 1;
   if new RMSE > RMSE
    Break
   Else
    RMSE = newRMSE, return and select the next variable;
   End if
    Repeat until |S| = j;
  End for
 Retrain with selected subset
 Calculate predicted value
End algorithm

The operating mechanism of the NMIFS-LSTM algorithm is mainly divided into two parts. In the algorithm, we build a model for prediction by LSTM with NMI for variable selection. The LSTM NN is trained to determine network hyperparameters and structure. Parameter F is set to “initial set of n variables” and parameter S set to “empty set.” The calculation method is NMI with the LSTM of RMSE and the first variable is chosen. We continue to choose next variables every step until the model gets worse or meets the stop criterion. Finally, the selected subset is modeled and the predicted value is obtained. The flowchart of the developed NMIFS-LSTM-based soft sensor model is shown in Figure 4.

4. Simulation Results and Discussion

In this paper, all algorithms use a common dataset with the same variable selection method after several trials in the same simulation environment setting. All established models were simulated in the same experimental environment. The program for algorithm simulation was coded in MATLAB 2019 and run under a Windows 8.1 operating system. The simulation results are recorded with the following standards:(1)Model size (MS) means the number of candidate variables selected in the ultimate algorithm(2)PMSE means the mean square error (MSE) is a measure that reflects the difference between the actual and value predicted value and can be calculated as follows:where and are the actual value and predicted value in the algorithm model of the output variable, respectively, and represents the number of datasets in the testing samples(3)Coefficient of determination denotes the square of sample correlation coefficients between the real value and prediction value

4.1. Application to a Debutanizer

To verify the efficacy of the developed soft sensor model, it was applied to a real debutanizer column. The flow diagram of actual debutanizer column unit is given in Figure 5. In the refining industry, the main function of the process is to separate butane from natural gas. At first, the entering liquid is heated into hot steam and then sent into the main tower (T102). The hot vapour condenses into liquid and is separated into a set of fractions with different boiling points. Butane and propane are detached in the column after the treatment under normal circumstances, which makes the natural gas almost pure methane.

In this case, the content of butane is very important to ensure the product quality during the process. However, this variable is very hard to measure in real time. Hence, a compatible online soft sensing model was proposed to forecast the content. Seven practical sensors were installed in the process, marked as yellow circles in the brief diagram [35], as displayed in Figure 5. All of these candidate variables are listed in Table 1.

2394 data samples were presented at intervals of 15 minutes. The dataset was separated into two parts: the dataset of the first 80% applied for training and the others for testing. On the basis of plant experts guidance [22], the time delay of the process was probably 20–60 minutes. Based on this advice, we extended the input variables to , in which means the value of at time . In addition, we added a guided value of (y–t) in each group to enhance the accuracy of modelling. The number of candidate variables was raised from 7 to 40, which resulted in additional complexity of the process.

Table 2 presents the experimental results with these four algorithms. The table shows that the NMI-LSTM algorithm has obvious advantage over others in model accuracy. The simulation performance presents that NMI-LSTM shows a tighter and higher accurate model than other methods.

Figure 6 shows the real values and prediction values of target variable by applying the NMIFS-LSTM model. The fitted graph clearly illustrates that our approach can follow the variations of the butane content successfully, which further verifies its efficacy.

4.2. Application to Power Plant Desulfurization Technology

The flue gas desulfurization system and industrial process parameters are basically collected from unit 9 of a thermal power plant, which achieves limestone-gypsum wet flue gas desulfurization technology with twin towers. The system’s SO2 is absorbed by lime or limestone with chemical reaction. Compared to the single tower, these twin towers can carry out secondary reaction of transmitted flue gas and eliminate SO2 in the flue gas more successfully. The flue gas desulfurization process includes SO2 absorption system, flue gas system, mist eliminator system, absorption tower overflow device, slurry mixing system of absorption tower, oxidizing blower, etc. These twin towers have an absorption area of 12 meters in diameter and a height of 32.6 meters. The flue gas containing SO2 moves from bottom to top where the bottom of the primary absorption tower (PAT) and encounters a liquid suspension from the spray layer. SO2 chemically reacts with the alkaline suspension through the gas film and the liquid film in a molecular diffusion manner. The PAT includes four spray layers that are dominated by circulating pumps, shown in Figure 7.

This paper collects the data sample of desulfurization index parameters of unit 9 of a thermal power plant as the research object. The dataset includes 30 input variables and a target output variable flue gas SO2 concentration. All candidate variables are given in Table 3. The time span is from July 1, 2019, to July 7, 2019, with a time interval of 1 min and a total of 10000 samples. The first 8000 samples are used as the training data and the others are used as the testing data. In the practical simulation experiment, redundant variables in the pool of candidate variables can lead to unsuitable modelling. Consequently, IVFS technique is very important for building a suitable and stable soft measurement model.

Table 4 presents the statistics of data-driven models with different algorithms. Experimental results present that NMI-LSTM has better performance with fewer input variables than other approaches. of NMI-LSTM is higher than 90%, representing that the proposed soft sensor can precisely forecast the actual values.

Figure 8 shows the prediction curve of SO2 concentration by NMI-LSTM algorithm. Obviously, NMI-LSTM can track the dynamic change of target variable effectively, which shows that our algorithm is very effective.

5. Conclusion

In this paper, a novel soft sensor was designed to model complex and dynamic industrial processes with time series characteristics. The LSTM network is trained by datasets taken from actual processes, and NMI is applied to select the variables related to the target variable. The proposed algorithm deletes one irrelevant variable at every step until all the variables are removed. After that, the path of variable selection appears and the algorithm takes the segment with the lowest prediction error. The proposed soft sensor was applied to two practical industrial processes. The simulation and comparison with other algorithms demonstrate the effectiveness and excellence of our approach. The developed soft sensor provides an additional and reliable monitoring tool for pivotal variables and can be further applied to the design of model predictive control systems.

The proposed soft sensor algorithm is easy to implement, and the related program can be preserved as a subroutine in the industrial computer of the DCS. By calling the subroutine, the soft sensor could be periodically retrained and updated with the new production data. The disadvantage of model degradation can be completely eliminated with this technique.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Dongfeng Li and Zhirui Li contributed equally to this article.

Acknowledgments

The authors acknowledge the Yau Mathematical Sciences Center and Huangtai Power Plant for data gathering and experiment. This work was partially supported by the Major Science and Technology Innovation Projects of Shandong Province (Grant no. 2019JZZY010731), the Key Research and Development Program of Shandong Province (Grant no. 2019GGX104037), the S.-T. Yau High School Science Award (Computer), and the National Natural Science Foundation of China (Grant no. 51874300).