Abstract
A novel optimization algorithm for multilayer perceptron (MLP) based soft sensors is proposed in this paper. The proposed approach integrates input variable selection and hidden layer optimization on MLP into a constrained optimization problem. The nonnegative garrote (NNG) is implemented to perform the shrinkage of input variables and optimization of hidden layer simultaneously. The optimal garrote parameter of NNG is determined by combining crossvalidation with HannanQuinn information criterion. The performance of the algorithm is demonstrated by an artificial dataset and the practical application of the desulfurization process in a thermal power plant. Comparative results demonstrated that the developed algorithm could build simpler and more accurate models than other stateoftheart soft sensor algorithms.
1. Introduction
In complex industrial processes, important process parameters that influence product quality or energy consumption need to be monitored and controlled in real time and with high accuracy. However, some of them are difficult to be directly measured with hardware sensors due to the limitations of existing field conditions [1–3]. Soft sensors achieve the mathematical modeling of these hardtomeasure parameters through auxiliary variables that are easy to be measured [4, 5]. Basically, there are two categories of soft sensor techniques: mechanism analysisbased approaches and datadriven approaches. The mechanism analysisbased approaches require accurate understanding of the inherent mechanism of complex industrial processes, which is very difficult for the researchers. Datadriven algorithms provide advanced alternatives with statistical inference and machine learning techniques [6, 7]. In recent years, datadriven soft sensors including principal component regression (PCR), partial least squares (PLS) regression, support vector machine (SVM), extreme learning machine (ELM), and artificial neural networks (ANNs) have been widely studied [8–12].
Due to their powerful nonlinear modeling competence, ANNs have become the most popular nonlinear modeling techniques. There are a variety of ANNs such as convolutional neural networks (CNN) [13], generative adversarial networks (GAN) [14], radial basis networks, and recurrent neural network (RNN) [15], each of which has its own characteristics and advantages. Among them, multilayer perceptron (MLP) is the most widely used technique for nonlinear soft sensing owing to its outstanding nonlinear mapping capability and convenience of application. Heidari et al. [16] built an accurate predictive model of nanofluid viscosity with MLP. Shen et al. [17] presented an MLPbased recursive sliding mode dynamic surface control scheme for a fully actuated surface vessel with uncertain dynamics and external disturbances. In [18], MLP was applied to predict the water content of biodiesel and diesel blend in terms of temperature and composition.
With the rapid development of process automation, more and more variables are involved in the modern process industry. Redundant input variables increase the model complexity, delay the training time, and decrease the predictive accuracy of the model [19, 20]. Variable selection technology provides a good solution to this problem and therefore is extensively studied [1, 21, 22]. Guo et al. [23] proposed an input variable selection method for a feedforward neural network (FNN) by using partial autocorrelation function and successfully forecasted the wind speed. Fock [24] proposed a new algorithm for the selection of input variables, in which the global sensitivity analysis technique was used to select the optimal input variables. Adil et al. [25] presented a new variable selection algorithm that used the heuristic method and minimum redundancy maximum relevance, and the experimental results showed better accuracy than other algorithms. In [26], a neural networkbased soft sensor was developed to predict effluent concentrations in a biological wastewater treatment plant, in which principal component analysis (PCA) was implemented to select optimal input variables.
Nonnegative garrote (NNG) is a linear coefficient shrinkage approach based on penalty likelihood function. In recent years, it has been widely used in the variable selection of ANNs [27]. Sun et al. [28] utilized the NNG to compress the input weights of the MLP to achieve nonlinear variable selection, and the superiority of the proposed algorithm was proved through two artificial dataset examples and a real industrial application. In [29], a local search strategy was incorporated into the NNGMLP to improve its performance. However, these algorithms only consider the selection of input variables and ignore the optimization of the internal structure of the MLP network. Actually, the redundant nodes of hidden layers worsen the performance of MLP as the redundant input variables do and even lead to overfitting of the model. Pan et al. [30] proposed a novel approach of simplifying the structure of deep neural network through regularization of network architecture. Anbananthen et al. [31] presented a pruning procedure, by which redundant links were deleted from the trained network. Monika and Venkatesan [32] designed a divisive ANN clustering algorithm to prune the neurons of the hidden layer of MLP, which promoted model accuracy. Fan et al. proposed an algorithm that utilized the least absolute shrinkage and selection operator (LASSO) to perform the selection of input variables and the optimization of the hidden layer of MLP, named dLASSO [33]. However, the variable selection and hidden layer optimization of dLASSO are independent of each other, which may cause the omission of the optimal solution.
According to our investigation, few existing methods deal with the redundancy of input variables and hidden layers of ANN models synchronously. In this paper, a novel algorithm that performs global dimension reduction and structure simplification for MLPbased soft sensors is proposed by elaborately combining NNG and MLP. The MLP is implemented to cope with the nonlinear dynamics of the industrial processes, and NNG is devised to conduct the selection of the input variables and simplification of the hidden layers. To the best of our knowledge, this algorithm is a quite innovative design of a penalty functionbased strategy for global optimizing the structure of ANNs. The effectiveness of the developed algorithm is validated by an artificial dataset and application to a practical industrial process to provide informative analysis.
The remainder of this paper is organized as follows. The background theories of the approach are reviewed in Section 2. Section 3 describes the detailed principles and development of the proposed algorithm. The simulation results and analysis of artiﬁcial datasets and practical industrial process are presented in Section 4. Finally, some concluding remarks are given in Section 5.
2. Theoretical Background
The architecture of a threelayer MLP discussed in the paper is demonstrated in Figure 1, which is composed of an output layer, a hidden layer, and an input layer. The number of neurons of input layer is dependent on the variables or columns of the input dataset, while that of the hidden layer is usually chosen by trial and error. The mathematical expression of the studied MLP is shown aswhere and denote the activation functions of the hidden and output layer, respectively, is the vector of input variables, and is the output variable. The weight is a matrix that links nodes of the input and hidden layer. is the bias vector of the hidden nodes. represents the matrix of output weights linking the hidden and the output layer. The output bias is denoted as .
For the linear regression problem,where is the vector of magnitude coeﬃcients and is the random error. Breiman proposed a constraint consisting of the summation of shrinkage coefficients and imposed it on the ordinary least squares (OLS) regression model [34]:in which represents the coefficient vector of OLS estimation and is the garrote parameter. is the input dataset, in which each column corresponds to a candidate input variable, and is the dataset of output variable.
In [28], the NNG algorithm was devised to select the input variable of MLP by imposing on the input layer:and equation (3) is consequently reformulated as
3. Development of GNNGMLP Algorithm
3.1. Design of Global Optimization for MLP
In the study, a global optimization algorithm for MLPbased soft sensor, called GNNGMLP, is proposed to reduce the redundancy of input and hidden layer simultaneously. The primary strategy of the proposed algorithm is to design a nonlinear quadratic optimization expression with NNG constraint that imposes the shrinkage coefficients on the input and hidden layers of MLP. The GNNGMLP is implemented with the continuous adjustment of the garrote parameter. The schematic diagram of the proposed algorithm is illustrated in Figure 2, in which the nodes and have null impacts on the model and will be removed from the MLP. Meanwhile, the weight lines connected to them will also be invalid.
The proposed algorithm is divided into two phases. In the first phase, a welltrained MLP network is presented with the conventional MLP training algorithm. At the second phase, a set of shrinkage coefficients are imposed on input and hidden layer of the obtained MLP. Consequently, the expression of MLP is reformulated as follows:where and denote the shrinkage coefficients of the nodes of input and hidden layer, respectively. and are obtained by solving the following formula:where indicates that the input variable is removed from the MLP and means that the hidden node is excluded from the model. Equation (7) is a nonlinear quadratic optimization problem with constraints that can be solved with trustregion reﬂective optimization algorithm [35]. After that, the optimal predictive model of MLP is presented by
3.2. Determination of Parameter s
The choice of parameter is very important for the developed algorithm because it can directly affect the extent of shrinkage on the MLP structure. implies that all input variables and hidden nodes will be eliminated. When , all the input variables and the hidden nodes will be completely preserved. Therefore, the value of directly determines the number of neurons and influences the performance of MLP. This paper adopts the enumeration approach to select the optimal from the vector . Herein, is set to a constant close to zero, and is set to . The other values of are equably distributed between and .
In this paper, Hannan–Quinn information criterion (HQ) [36] that can balance the accuracy and complexity of a model is adopted as the model evaluation criterion that is formulated aswhere denotes the number of data samples, represents the number of input variables, and and are the actual and predictive value of the output variable, respectively. Considering the randomness of ANNs, the Vfold crossvalidation (CV) method is taken to validate the model. The execution is described as follows. Firstly, the group of all datasets is evenly separated into V subdatasets. Secondly, a single subdataset is taken as the validation dataset, and the other V1 subdatasets are used as the training dataset to acquire the trained MLP. The procedure is repeated V times, and these V results are averaged to present the ultimate estimate. In this work, s is chosen by Vfold CV with HQ, whose pseudocode is shown in Algorithm 1.

3.3. The Computational Procedure of Proposed Algorithm
In this paper, a global optimization algorithm for MLP is developed. The advancement of the proposed algorithm is that it not only deals with the redundancy of input variables but also simplifies the internal structure of MLP. The overall computation flow of the algorithm is described as follows: Step 1. Initialization: get a trained MLP with the training dataset . Step 2. Impose the NNG coefficients on the input and hidden nodes of the MLP. Step 3. Perform Algorithm 1 to obtain the optimal as . Step 4. Acquire the shrinkage coefficient and by solving equation (7) with parameter . Step 5. Updated weights of input and hidden nodes by substituting and into equation (8). Step 6. Remove the columns whose corresponding coefficient from , and delete the hidden nodes whose corresponding coefficient . Step 7. Output the optimized MLP.
4. Simulation Results
4.1. Experimental Setting
In the paper, comprehensive simulations are implemented to verify the performance of the proposed algorithm, in which comparisons with other stateoftheart variable selection algorithms such as SBSMLP [37], NNGEOMLP [29], and dLASSOMLP [38] are performed. All algorithms are simulated under the same settings. The MLP structure in the case is a typical threelayer configuration, in which the activation function of hidden and output layer is hyperbolic tangent and linear, respectively. The initial number of hidden nodes is determined by some trial runs. Training and testing data take up 80% and 20% of the overall dataset, respectively. 5fold CV is employed in the algorithm. The performance of the involved algorithms is assessed with the following five measures.(1)MSE: the mean square error between the predicted and the actual value with the testing dataset, .(2)Adjusted R_Square (): , where is the mean value of output variable.(3)Neurons: the total number of the input and hidden nodes in the optimized MLP.(4)Falsepositive selection (FS+): the number of irrelevant variables included in the optimized MLP.(5)Falsenegative selection (FS−): the number of relevant variables excluded from the optimized MLP.
4.2. Simulation Results of Artificial Dataset
In this subsection, a nonlinear model that was proposed in [28] is applied to generate artificial datasets. The input dataset was produced from a multivariate normal distribution with covariance matrix , in which covariance between two different variables (columns) , . The mathematical expression of the model iswhere are relevant variables, is white Gaussian noise, and . Besides the relevant variables, irrelevant dataset is produced to make this case a problem of selecting 10 relevant variables out of 50 variables.
Table 1 presents the statistical results of artificial dataset with different algorithms after 20 runs. In this case, of the covariance matrix is set to 0.8, which generates a dataset with a high correlation between different variables. According to the numerical comparison of MSE and , the GNNGMLP has the highest prediction accuracy among all algorithms. Furthermore, FS+ is the smallest, which indicates that the GNNGMLP selects fewer irrelevant variables than other approaches. By comprehensively comparisons of FS+ and FS−, it is can be concluded that our algorithm could select relevant variables with more precision. Besides, statistical results of neurons show that GNNGMLP can effectively remove the redundant nodes and then improve the performance of the model. It can be found from the results that our algorithm solves the problems of input variables and model redundancy simultaneously.
In addition, the capability of different algorithms is further compared by changing the value of collinearity . Figure 3 shows the comparison of the five indicators with different . It can be seen that the GNNGMLP consistently yields the lowest MSE, meaning that our algorithm always has the best accuracy. The number of hidden layer nodes with GNNGMLP is always the lowest, which proves the efficiency of reducing the redundancy with our approach. Moreover, our algorithm also performs the best on other indicators in most cases, which demonstrates that our algorithm has the best stability.
(a)
(b)
(c)
(d)
(e)
4.3. Application to an Actual Desulfurization Process of Power Plant
In this section, the developed algorithm was applied to forecast the SO_{2} emissions from a desulfurization process of a thermal power plant in China. The structural diagram of the process is shown in Figure 4. This power plant adopts limestonegypsum wet flue gas desulfurization technology, which includes SO_{2} absorption system, flue gas system, and compressed air system. The technology mainly uses lime and limestone to absorb SO_{2} by chemical reactions that are shown as follows:
The limestone slurry entering the primary absorption tower is dissolved in the absorption tower slurry pool. By adjusting the amount of limestone slurry entering the absorption tower or the concentration of the slurry discharged from the absorption tower, the pH value of the absorption tower slurry pool is maintained between 5.5 and 6.5 to ensure the limestone dissolution and SO_{2} absorption. After the original flue gas first enters the primary absorption tower, it passes through the spray zone in countercurrent, is fully contacted with the slurry to absorb SO_{2}, and then enters the secondary absorption tower. The remaining SO_{2} and other harmful components in the flue gas are absorbed in the spray zone. Finally, the dust is removed by a wet dust collector and discharged to the chimney. The two absorption towers adopt almost the same structure that is demonstrated in Figure 5.
Table 2 presents the statistical results of 20 runs with different soft sensor algorithms. It can be found that GNNGMLP has better prediction accuracy with a smaller number of neurons than other approaches. This result shows that GNNGMLP can improve the accuracy of the model by simplifying the internal structure of the MLP.
Figure 6 shows the comparison of predictive and actual value of the target variable with our algorithm. Obviously, the proposed algorithm can effectively track the dynamic change of the target variable.
In order to further prove the accuracy of the proposed algorithm, error comparisons between the measured and the predicted SO_{2} concentration with different algorithms are presented in Figure 7. The results show that the error of GNNGMLP is the lowest and within the range [−4.2, 4.2] in most instances, which can meet the requirements of the field operating. The performance of the developed soft sensor is fully compliant with the standards of industry demand.
(a)
(b)
(c)
(d)
(e)
Besides, comparative analyses based on the statistical results of variable selection and the actual industrial operating experience are given. Figure 8 presents the frequency of input variable selection over 100 runs. It can be found from Figure 8 that variable 13 is included in all solutions, and variables 17 and 30 are selected more than 80 times.
According to the statistics, the most relevant input variable to the output variable is variable 13. In terms of the manual book of the system, variable 13 is the SO_{2} concentration of #91AT outlet’s flue gas and the SO_{2} concentration of #92AT inlet’s flue gas. Obviously, this variable is highly related to the SO_{2} concentration of final emission. Variable 17, that is, the limestone slurry to #9 AT flow, has 90% of selection frequency. It can be seen from formulas (11) and (12) that the CaO and CaCO_{3} in limestone slurry can absorb the released SO_{2}. Therefore, variable 17 is included in the optimal solution. The variable 30 is the pH value of the slurry in the tower 92. The slurry absorbs more SO_{2} when the SO_{2} concentration in the flue gas is relatively high. As a result of this, a large amount of hydrogen ions will be generated, and the pH value will decrease.
5. Conclusions
This paper proposed a new optimization algorithm for MLPbased soft sensors with NNG. The advantage of this algorithm is that it can simultaneously perform the selection of the input layer and the optimization of the hidden layer for MLP and therefore has more tendency to get the global optimal model. The simulation results on the artificial datasets demonstrate that GNNGMLP has obvious advantages in both the number of neurons and the generalization performance of the model. In addition, the algorithm is applied to forecast the SO_{2} emission in a desulfurization process to verify the reading of the online analyzer. Comprehensive results and comparisons prove that the developed soft sensor has remarkable model simplicity and accuracy. The proposed soft sensor can be further implemented for the optimization and control design of the desulfurization process.
Data Availability
The data used to support the findings of this study are currently under embargo, while the research findings are commercialized. Requests for data 24 months after publication of this article will be considered by the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
The work was supported by the Key Research and Development Program of Shandong Province (Grant no. 2019GGX104037).