Abstract

With the continuous improvement of automation in industrial production, industrial process data tends to arrive continuously in many cases. The ability to handle large amounts of data incrementally and efficiently is indispensable for modern machine learning (ML) algorithms. According to the characteristics of industrial production process, we address an ILES (incremental learning ensemble strategy) that incorporates incremental learning to extract information efficiently from constantly incoming data. The ILES aggregates multiple sublearning machines by different weights for better accuracy. When new data set arrives, a new submachine will be trained and aggregated into ensemble soft sensor model according to its weight. The other submachines' weights will be updated at the same time. Then a new updated soft sensor ensemble model can be obtained. The weight updating rules are designed by considering the prediction accuracy of submachines with new arrived data. So the update can fit the data change and obtain new information efficiently. The sizing percentage soft sensor model is established to learn the information from the production data in the sizing of industrial processes and to test the performance of ILES, where the ELM (Extreme Learning Machine) is selected as the sublearning machine. The comparison is done among new method, single ELM, AdaBoost.R ELM, and OS-ELM, and the test of the extensions is done with three test functions. The results of the experiments demonstrate that the soft sensor model based on the ILES has the best accuracy and ability of online updating.

1. Introduction

During industrial processes, plants are usually heavily instrumented with a large number of sensors for process monitoring and control. However, there are still many process parameters that cannot be measured accurately because of high temperature, high pressure, complex physical and chemical reactions and large delays, etc. Soft sensor technology provides an effective way to solve these problems. The original and still the most dominant application area of soft sensors is the prediction of process variables, which can be determined either at low sampling rates or through off-line analysis only. Because these variables are often related to the process product quality, they are very important for process control and quality management. Additionally, the soft sensor application field usually refers to online prediction during the process of production.

Currently, with the continuous improvement of automation in industrial production, large amounts of industrial process data can be measured, collected, and stored automatically. It can provide strong support for the establishment of data-driven soft sensor models. Meanwhile, with the rapid development and wide application of big data technology, soft sensor technology has already been used widely and plays an essential role in the development of industrial process detection and control systems in industrial production. Artificial intelligence and machine learning, as the important core technologies, are getting increasingly more attention. The traditional machine learning algorithm generally refers to the single learning machine model that is trained by training sets, and then the unknown samples will be predicted based on this model. However, the single learning machine models have to face defects that cannot be overcome by themselves, such as unsatisfactory accuracy and generalization performance, especially for complex industrial processes. Specifically, in the supervised machine learning approach, the model’s hypothesis is produced to predict the new incoming instances by using predefined label instances. When the multiple hypotheses that support the final decision are aggregated together, it is called ensemble learning. Compared with the single learning machine model, the ensemble learning technique is beneficial for improving quality and accuracy. Therefore, increasingly more researchers are studying how to improve the speed, accuracy and generalization performance of integrated algorithms instead of developing strong learning machines.

Ensemble algorithms were originally developed for solving binary classification problems [1], and then AdaBoost.M1 and AdaBoost.M2 were proposed by Freund and Schapire [2] for solving multiclassification problems. Thus far, there are many different versions of boosting algorithms for solving classification problems [37], such as boosting by filtering and boosting by subsampling. However, for regression problems, it is not possible to predict the output exactly as that in classification. To solve regression problems using ensemble techniques, Freund and Schapire [2] extended AdaBoost.M2 to AdaBoost.R, which projects the regression sample into a classification data set. Drucker [8] proposed the AdaBoost.R2 algorithm, which is an ad hoc modification of AdaBoost.R. Avnimelech and Intrator [9] extended the boosting algorithm for regression problems by introducing the notion of weak and strong learning as well as an appropriate equivalence theorem between the two. Feely [10] proposed BEM (big error margin) boosting method, which is quite similar to the AdaBoost.R2. In BEM, the prediction error is compared with the preset threshold value, BEM, and the corresponding example is classified as either well or poorly predicted. Shrestha and Solomatine [11] proposed an AdaBoost.RT, with the idea of filtering out the examples with relative estimation errors that are higher than the preset threshold value. However, the value of the threshold is difficult to set without experience. For solving this problem, Tian and Mao [12] present a modified AdaBoost.RT algorithm that can adaptively modify the threshold value according to the change in RMSE. Although ensemble algorithms have the ability to enhance the accuracy of soft sensors, they are still at a loss for the further information in the new coming data.

In recent years, with the rapid growth of data size, a fresh research perspective has arisen to face the large amount of unknown important information contained in incoming data streams in various fields. How can we obtain methods that can quickly and efficiently extract information from constantly incoming data? The batch learning is meaningless, and the algorithm needs to have the capability of real-time processing because of the demand of real-time updated data in industrial processes. Incremental learning idea is helpful to solve the above problem. If a learning machine has this kind of idea, it can learn new knowledge from new data sets and can retain old knowledge without accessing the old data set. Thus, the incremental learning strategy can profoundly increase the processing speed of new data while also saving the computer’s storage space. The ensemble learning methods can be improved by combining the characteristics of the ensemble strategy and incremental learning. It is an effective and suitable way to solve the problem of stream data mining [1315]. Learn++ is a representative ensemble algorithm with the ability of incremental learning. This algorithm is designed by Polikar et al. based on AdaBoost and supervised learning [16]. In Learn ++, the new data is also assigned sample weights, which update according to the results of classification at each iteration. Then, a newly trained weak classifier is added to the ensemble classifier. Based on the Learn ++ CDS algorithm, G Ditzler and Polikar proposed Learn ++ NIE [17] to improve the effect of classification on a few categories. Most of research of incremental learning and ensemble algorithm focus on the classification field, while the research in the field of regression is still less. Meanwhile the limitation of ensemble approaches is that they cannot address the essential problem of incremental learning well, what the essential problem is accumulating experience over time then adapting and using it to facilitate future learning process [1820].

For the process of industrial production, increasingly more intelligent methods are used in soft sensors with the fast development of artificial intelligence. However, the practical applications of soft sensors in industrial production are not good. The common shortages of soft sensors are unsatisfactory, unstable prediction accuracy, and poor online updating abilities. It is difficult to meet a variety of changes in the process of industrial production. Therefore, in this paper, we mainly focus on how to add the incremental learning capability to the ensemble soft sensor modeling method and hopefully provide useful suggestions to enhance both the generation and online application abilities of soft sensors for industrial process. Aiming at the demands of soft sensors for industrial applications, a new detection strategy is proposed with multiple learning machines ensembles to improve the accuracy of the soft sensors based on intelligent algorithms. Additionally, in practical production applications, acquisition of information in new production data is expensive and time consuming. Consequently, it is necessary to update the soft sensor in an incremental fashion to accommodate new data without compromising the performance on old data. Practically, in most traditional intelligent prediction models for industrial process, the updates are often neglected. Some models use traditional updating methods that retrain the models by using all production data or using the updated data and forgo the old data. This kind of methods is not good enough because some good performances have to be lost to learn new information [21]. Against this background, we present a new incremental learning ensemble strategy with a better incremental learning ability to establish the soft sensor model, which can learn additional information from new data and preserve previously acquired knowledge. The update does not require the original data that was used to train the existing old model.

In the rest of this paper, we first describe the details of the incremental learning ensemble strategy (ILES), which involves the strategy of updating the weights, the ensemble strategy, and the strategy of incremental learning for real-time updating. Then, we design experiments to test the performance of the ILES for industrial process soft sensors. The sizing percentage of the soft sensor model is built by the ILES in the sizing production process. The parameters of the ILES are discussed. We also compare the performance of the ILES to those of other methods. To verify the universal use of the new algorithm, three test functions are used to test the improvement on the predictive performance of the ILES. Finally, we summarize our conclusions and highlight future research directions.

2. The Incremental Learning Ensemble Strategy

The industrial process needs soft sensors with good accuracy and online updating performance. Here, we focus on incorporating the incremental learning idea into the ensemble regression strategy to achieve better performance of soft sensors. A new ensemble strategy called ILES for industrial process soft sensors that combines the ensemble strategy with the incremental learning idea is proposed. The ILES has the ability to enhance soft sensors’ accuracy by aggregating the multiple sublearning machines according to their training errors and prediction errors. Additionally, during the iteration process, incremental learning is added to obtain the information from new data by updating the weights. It is beneficial to enhance the real-time updating ability of industrial process online soft sensors. The details of the ILES are described as shown in Algorithm 1.

Input
(i) sub datasets are drawn from original data set. Here, , .
(ii) The number of sub learning machines is .
(iii) The coefficients of determination are and .
For
  Initialize . Here, , is the amount of data.
  For
(1) Calculate . is a distribution, and is the weight of .
(2) Randomly choose the training sub dataset and the testing sub dataset according to .
(3) The sub learning machine is trained by to obtain a soft sensor model .
(4) Calculate the error of using and :
(5) Calculate the error rate . If , give up , and return to step (2).
(6) Calculate , where or 3. Obtain the ensemble soft sensor model according to :
(7) Calculate the error of using . If , give up , and return to step (2).
(8) Calculate to update the weights:
Output: Obtain the ensemble soft sensor model according to :
2.1. Strategy of Updating the Weight

In each iteration of , the initial is distributed to each sample with the same values. This means that the samples have the same chance to be included in training dataset or tested dataset at the beginning. In the subsequent iterations, the weight will be calculated as for every sublearning machine (in each iteration of ). In contrast to the traditional AdaBoost.R, here, the testing subdataset is added to test the learning performance in each iteration. It is useful to ensure the generalization performance of the ensemble soft sensors. Then, the distribution will be changed according to the training and testing errors at the end of each iteration. Here, the training subdataset and the testing subdataset will be randomly chosen according to (for example, by using the roulette method). The sublearning machine is trained by , and a hypothesized soft sensor will be obtained. Then, the training error and testing error of can be calculated as follows:The error rate of on is defined as follows:

If , the submodel is regarded as an unqualified and suspicious model. The hypothesis is given up. Otherwise, the power coefficient is calculated as (e.g., linear, square, or cubic). Here, is the coefficient of determination. After iterations, the composite hypothesis can be obtained according to the hypothesized soft sensors (sublearning machines) . The training subdataset error, the testing subdataset error, and the error rate of are calculated similarly to those of . In the same way, if , the hypothesis is given up. At the end of the iterations, according to the error rate , the weight is updated as follows:where . In the next iteration, the and will be chosen again according to the new distribution, which is calculated by the new weight . During the above process of iterations, the updating of the weights depends on the training and testing performance of the sublearning machines with different data. Therefore, the data with large errors will have a larger distribution for the difficult learning. It means that the “difficult” data will have more chances to be trained until the information in the data is obtained. Conversely, the sublearning machines or hypothesized soft sensors are reserved selectively based on their performance. Therefore, the final hypothesized soft sensors are well qualified to aggregate the composite hypothesis. This strategy is very effective for improving the accuracy of ensemble soft sensors.

2.2. Strategy of Ensemble with Incremental Learning

Aiming at the needs of real-time updates, the incremental learning strategy is integrated into the ensemble process. First, the subdatasets , are selected randomly from the dataset. In each iteration , the sublearning machines are trained and tested. Therefore, for each subdataset, when the inner loop () is finished, the hypothesized soft sensors are generated. An ensemble soft sensor is obtained based on the combined outputs of the individual hypotheses, which constitute the composite hypothesis .Here, the better hypotheses will be aggregated with larger chances. Therefore, the best performance of ensemble soft sensors is ensured based on these sublearning machines. Then, the training subdataset error and testing subdataset error of can be calculated similarly to the error of .The error rate of on is defined as follows:After hypotheses are generated for each subdataset, the final hypothesis is obtained by the weighted majority voting of all the composite hypotheses.When new data come constantly during the industrial process, new subdatasets will be generated (they will be the , …). Based on a new subdataset, a new hypothesized soft sensor can be trained by a new iteration. The new information in the new data will be obtained and added to the final ensemble soft sensor according to (7). As the added incremental learning strategy, the ensemble soft sensor is updated based on the old hypothesis. Therefore, the information in the old data is also retained, and the increment of information from new data is achieved.

Overall, in the above ILES, the ensemble strategy is efficient to improve the prediction accuracy using the changed distribution. Therefore, the ILES will give more attention to the “difficult” data with big errors in every iteration that are attributable to the new distribution. Due to the harder learning for the “difficult” data, more information can be obtained. Therefore, the soft sensor model is built more completely, and the accuracy of perdition is improved. Moreover, the data that is used to train the sublearning machines is divided into a training subdataset and a testing subdataset. The testing error will be used in the following steps: the weight update and the composite hypothesis ensemble. Therefore, the generalization of the soft sensor model based on the ILES can be improved efficiently, especially compared with traditional algorithms. Additionally, when the new data is added, the new ILES with incremental learning ability can learn the new data in real-time and does not give up the old information from the old data. The ILES can save the information of old sublearning machines that have been trained, but it does not need to save the original data. In other words, only a small amount of new production data being saved is enough. This strategy is efficient to save space. Furthermore, the new ILES also may save time compared with the traditional updating method. This strategy is attributed to the conservation of the old and the sublearning machines in composite hypotheses (7).

3. Experiments

In this chapter, the proposed ILES is tested in the sizing production process for predicting the sizing percentage. First, the influence of each parameter on the performance of the proposed algorithm is discussed. Meanwhile, the real industrial process data is used to establish the soft sensor model to verify the incremental learning performance of the algorithm. Finally, to prove its generalization performance, three test functions are used to verify the improvement of the prediction performance. The methods are implemented using MATLAB language and all the experiments are performed on a PC with the Intel Core 7500U CPU (2.70GHZ for each single core) and the Windows 10 operation system.

3.1. Sizing Production Process and Sizing Percentage

The double-dip and double pressure sizing processes are widely used in textile mills, as shown in Figure 1. The sizing percentage plays an important role during the process of sizing for good sizing quality. In addition, the sizing agent control of warp sizing is essential for improving both productivity and product quality. The sizing percentage online detection is a key factor for successful sizing control during the sizing process. The traditional sizing detection methods, which are instruments measurement and indirect calculation, have expensive prices or unsatisfactory accuracy. Soft sensors provide an effective way to predict the sizing percentage and to overcome the above shortages. According to the mechanism analysis of the sizing process, the influencing factors on the sizing percentage are slurry concentration, slurry viscosity, slurry temperature, the pressure of the first Grouting roller, the pressure of the second Grouting roller, the position of immersion roller, the speed of the sizing machine, the cover coefficient of the yarn, the yarn tension, and the drying temperature [22]. In the following soft sensor modeling process, the inputs of soft sensors are the nine influencing factors, and the output is the sizing percentage.

3.2. Experiments for the Parameters of the ILES

Here, we select ELM as the sublearning machine of the ILES, due to its good performance, such as fast learning speed and simple parameter choices [22, 23]; the appendix reviews the process of ELM. Then, experiments with different parameters of the ILES are done to research the performance trend of the ILES when the parameters change.

First, the experiments to assess the ILES algorithm’s performance are done with different . Here, the increases from 1 to 15. Figure 2 shows the results of the training errors and the testing errors with different . Along with the increasing , the training and testing errors decrease. When increases to 7, the testing error is the smallest. However, when increases to more than 9, the testing error becomes larger again. Furthermore, the training errors only slightly decrease. Therefore, we can draw the conclusion when the parameter is 7 that the performance of ILES is the best. The comparison is also done between AdaBoost.R and the ILES regarding the testing errors with different numbers of ELMs in Figure 3. Although the RMSE means of AdaBoost.R and the ILES are different, their performance trends are similar with the increasing number of ELMs. Here, the RMSE is described as

Second, we discuss the impact of parameter ( and ) changes on the ILES performance. The experiments demonstrate that when is too small, the performance of ELM is difficult to achieve the preset goal, and the iteration is difficult to stop. is also not larger than 80 percent of the average of the relative errors of ELMs; otherwise the can not be obtained. Furthermore, the value of determines the number of “better samples”. Here the “better samples” refer to the samples that can reach the expected precision standard of predicted results by submachines. If is too small, the ELM soft sensor model () will not be sufficiently obtained. If is too large, the “bad” ELM model () will be aggregated into the final composite hypothesized . Then, the accuracy of the ILES cannot be improved. The relationships among , and RMSE are shown in Table 1. When and , the model has the best performance (the RMSE is 0.3084).

3.3. Experiments for the Learning Process of the ILES

For establishing the soft sensor model based on the ILES, a total of 550 observations of real production data are collected from Tianjin Textile Engineering Institute Co., Ltd., of which 50 data are selected randomly as testing data. The remaining 500 data are divided into two data sets according to the time of production. The former 450 data are used as the training data set, and the latter 50 data are used as the update data set. The inputs are 9 factors that affect the sizing percentage. The output is the sizing percentage. The parameters of the ILES are , , , and . That is to say, the 450 training data are divided into 9 subdatasets , and the number of ELMs is 7. According to the needs of the sizing production, the predictive accuracy of the soft sensors is defined aswhere is the number of times with an error < 0.6 and is the total number of testing times.

Since the learning process is similar to the OS-ELM [24] update process. It is an online assessment model that is capable of updating network parameters based on new arriving data without retrains historical data. Therefore, while comparing the accuracy of IELS learning process, it is also compared with OS-ELM. The two columns on the right side of Table 2 show the changes in the soft sensor accuracy during the learning process of the ILES and OS-ELM. It can be seen that the stability and accuracy of ILES are superior to OS-ELM.

3.4. Comparison

In this experiment, we used 10-fold cross validation to test the model’s performance. The first 500 data sets are randomly divided into 10 subdatasets ~. The remaining 50 data sets are used as the updated data set S11. The single subdataset from ~ will be retained as the validation data for testing the model, and the remaining 9 subdatasets are used as the training data. For comparing the new method with other soft sensor methods, the single ELM and ensemble ELM based on AdaBoost.R are also applied to build the sizing percentage soft sensor models as traditional methods with the same data set. The soft sensor models are listed as follows.

Single ELM model: the main factors that affect the sizing percentage are the inputs of the model. The input layer of the ELM has 9 nodes. The hidden layer of the ELM has 2300 nodes, and the output layer of the ELM has one node, which is the sizing percentage.

AdaBoost.R model: the parameters and the structure of the base ELM are the same as those of the single ELM. AdaBoost.R has 13 iterations.

ILES model: the ELMs are same as the single ELM model described above. The parameters of the ILES are , , and (time consuming 163s).

Figures 4(a)4(c) shows the predicted sizing percentage of different soft sensor models based on different methods. The experiments demonstrate that the strategy of the ILES can improve the accuracy of the soft sensor model. In addition, the training errors of the above three models all can achieve 0.2. However, the testing errors of the prediction models of AdaBoost.R and the ILES are smaller than that of the single ELM. This result means that the ensemble methods have better generalization performance. Table 3 shows the performance of the prediction model based on different methods after updating. The results of the comparison experiments show that the soft sensor based on the new ILES has the best accuracy and the smallest RMSE. This result is attributed to the use of the testing subdataset in the ensemble strategy and the incremental learning strategy during the learning process of the ILES algorithm. Overall, the accuracy of the soft sensor can fit the needs of actual production processes. Moreover, the incremental learning performance can ensure the application of industrial process soft sensors in practical production.

3.5. Experiments for the Performance of the ILES by Test Functions

To verify the universal use of the algorithm, three test functions are used to test the improvement of the prediction performance. These test functions are Friedman#1, Friedman#2, and Friedman#3. Table 4 shows the expression of each test model and the value range of each variable. Friedman#1 has a total of 10 input variables, of which there are five input variables associated with the output variable, and the other five input variables are independent of the output variables. The Friedman#2 and Friedman#3 test functions incorporate the impedance phase change of the AC circuit.

Through continuous debugging, the parameters of each algorithm are determined as shown in Table 5. For every test function, generate a total of 900 data, and 78% of the total samples were selected as training samples, 11% as updating samples and 11% as testing samples, according to the need for different test models. That is to say, the 700 training data are divided into 7 subdatasets ~. Figures 57 show the predicted results of Friedman#1, Friedman#2, and Friedman #3 with different soft sensor models based on different methods (time consuming 227s). The comparison of the performances of the different soft sensors is shown in Table 6. It shows the soft sensor model based on ILES has the best performance.

4. Conclusions

An ILES algorithm is proposed for better accuracy and incremental learning ability for industrial process soft sensors. The sizing percentage soft sensor model is established to test the performance of the ILES. The main factors that influence the sizing percentage are the inputs of the soft sensor model. Then, the ensemble model is trained with different subtraining dataset, and a soft sensor model with incremental learning performance is obtained by the ILES strategy. When new data add up to a certain number, the model will be updated using an incremental learning strategy. The new sizing percentage soft sensor model is used in Tianjin Textile Engineering Institute Co., Ltd. The experiments demonstrate that the new soft sensor model based on the ILES has good performance. The predictive accuracy of the new soft sensor model could be satisfied for sizing production. Finally, the new ILES is also tested with three testing functions to verify the performance with different data sets for universal use. Because the size of the subdataset is different from the experiment on sizing percentage, it can be conclude from the prediction results that the size of subdataset does not affect the performance of the algorithm. In the future, the ILES can also be used in other complex industry processes that require the use of soft sensors.

Appendix

Review of Extreme Learning Machine

Single Hidden Layer Feedforward Networks (SLFNs) with Random Hidden Nodes

For arbitrary distinct samples , where and , standard SLFNs with hidden nodes and the activation function are mathematically modeled as

where is the weight vector connecting the th hidden node and the input nodes, is the weight vector connecting the th hidden node and the output nodes, is the output vector of the SLFN, and is the threshold of the ith hidden node. denotes the inner product of and . The output nodes are chosen linearly. The standard SLFNs with hidden nodes with the activation function can approximate these samples with zero error means such that . These equations can be written compactly as follows:

where

and

Here, is the hidden layer output matrix.

ELM Algorithm. The parameters of the hidden nodes do not need to be tuned and can be randomly generated permanently according to any continuous probability distribution. The unique smallest norm least squares solution of the above linear system is

where H+ is the Moore-Penrose generalized inverse of matrix H.

Thus, a simple learning method for SLFNs, called extreme learning machine (ELM), can be summarized as follows.

Step 1. Randomly assign input weight and bias ,

Step 2. Calculate the hidden layer output matrix H.

Step 3. Calculate the output weight .

The universal approximation capability of the ELM has been rigorously proved in an incremental method by Huang et al. [23].

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (Grants nos. 71602143, 61403277, 61573086, and 51607122), Tianjin Natural Science Foundation (no. 18JCYBJC22000), Tianjin Science and Technology Correspondent Project (no. 18JCTPJC62600), the Program for Innovative Research Team in University of Tianjin (nos. TD13-5038, TD13-5036), and State Key Laboratory of Process Automation in Mining & Metallurgy/Beijing Key Laboratory of Process Automation in Mining & Metallurgy Research Fund Project (BGRIMM-KZSKL-2017-01).