Incremental Multiple Hidden Layers Regularized Extreme Learning Machine Based on Forced Positive-Definite Cholesky Factorization

Liu, Jingyi; Le, Ba Tuan

doi:https://doi.org/10.1155/2019/6740523

Mathematical Problems in Engineering

On this page

Abstract Introduction Results and Discussion Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2019 | Article ID 6740523 | https://doi.org/10.1155/2019/6740523

Incremental Multiple Hidden Layers Regularized Extreme Learning Machine Based on Forced Positive-Definite Cholesky Factorization

Jingyi Liu¹and Ba Tuan Le^2,3

Academic Editor: Alberto Olivares

Received17 Oct 2018

Accepted04 Apr 2019

Published24 Apr 2019

Abstract

The theory and implementation of extreme learning machine (ELM) prove that it is a simple, efficient, and accurate machine learning method. Compared with other single hidden layer feedforward neural network algorithms, ELM is characterized by simpler parameter selection rules, faster convergence speed, and less human intervention. The multiple hidden layer regularized extreme learning machine (MRELM) inherits these advantages of ELM and has higher prediction accuracy. In the MRELM model, the number of hidden layers is randomly initiated and fixed, and there is no iterative tuning process. However, the optimal number of hidden layers is the key factor to determine the generalization ability of MRELM. Given this situation, it is obviously unreasonable to determine this number by trial and random initialization. In this paper, an incremental MRELM training algorithm (FC-IMRELM) based on forced positive-definite Cholesky factorization is put forward to solve the network structure design problem of MRELM. First, an MRELM-based prediction model with one hidden layer is constructed, and then a new hidden layer is added to the prediction model in each training step until the generalization performance of the prediction model reaches its peak value. Thus, the optimal network structure of the prediction model is determined. In the training procedure, forced positive-definite Cholesky factorization is used to calculate the output weights of MRELM, which avoids the calculation of the inverse matrix and Moore-Penrose generalized inverse of matrix involved in the training process of hidden layer parameters. Therefore, FC-IMRELM prediction model can effectively reduce the computational cost brought by the process of increasing the number of hidden layers. Experiments on classification and regression problems indicate that the algorithm can be effectively used to determine the optimal network structure of MRELM, and the prediction model training by the algorithm has excellent performance in prediction accuracy and computational cost.

1. Introduction

The neural network is a complex nonlinear system interconnected by a large number of neurons, and it is based on the research of human brain information processing ability by modern neurobiology and cognitive science. The neural network is also a mathematical simulation form of human brain physiological structure, with strong adaptability, self-learning ability, and nonlinear mapping, and it has been widely used by researchers in many scientific fields [1–3]. However, the above prediction models are all based on the traditional neural networks, and the network training process needs to modify the network weights repeatedly according to the training objectives and gradient information. The entire network training process usually takes hundreds or even thousands of iterations before it can be finally completed, which requires a large amount of calculation.

Extreme learning machine (ELM) is a novel single hidden layer feedforward neural network. It transforms the iterative adjustment process of traditional neural network parameter training into solving linear equations. According to Moore-Penrose generalized inverse matrix theory, the least squares solution with the minimum norm is obtained analytically as the network weights. The whole training process can be completed in one time without iteration. Compared with the traditional neural network training algorithm, which requires several iterations to determine the network weights, the training speed of ELM is significantly improved [4, 5]. This advantage enables ELM to be successfully applied in pattern recognition [6, 7] and regression estimation [8–10]. In order to improve the generalization ability of ELM, the literature [11] draws on the principle of structural risk minimization in statistical learning theory and proposes a regularized extreme learning machine (RELM). RELM has a better generalization ability by introducing parameters to weigh structural risks and empirical risks [12–15]. For the single hidden layer RELM model with multiple input and single output, the literature [16] designed the Cholesky factorization method for regularized output weight matrix. In the learning and forgetting process of the sample sequence, the Cholesky factorization factor is calculated recursively by adding and deleting samples one by one, and then the output weights are adjusted, and the network structure is fixed. However, if dealing with input data with complex noise signals and high-dimensional information, or with more classification categories, RELM also shows its own shortcomings, and the accuracy of the established model is greatly reduced.

In order to improve the embarrassing situation of RELM, the literature [17, 18] starts from improving its network structure. On the basis of the traditional RELM three-layer structure, the number of hidden layers is increased to form a neural network with one input layer, multiple hidden layers, and one output layer, that is, the multiple hidden layers RELM network model (MRELM), in which the neuron nodes of each hidden layer are fully connected. MRELM inherits the idea that RELM randomly initializes the weights matrix between the input layer and the hidden layer as well as the bias vector of the hidden layer. By forcing the actual output of the hidden layer to be as close as possible to the expected output, the weights matrix and the bias vector of the added hidden layers are calculated; thereby a neural network model with multiple hidden layers is established. The parameter training process needs to calculate the inverse matrix and the MP generalized inverse matrix, in which the first hidden layer parameters are randomly initialized, and the remaining hidden layers parameters are obtained by minimizing the error between the actual output and the expected output of the corresponding hidden layer. Compared with the traditional RELM model, MRELM can effectively improve the prediction accuracy through the layer-by-layer optimization of network parameters between different hidden layers. Moreover, it has the advantages of strong generalization ability and fast computing speed and is not easy to fall into local optimum [19–22]. However, since the initial parameter values of MRELM are randomly initialized, although this avoids the situation that the algorithm falls into local optimum and overfitting, it also leads to the failure of some hidden layers or the reduction of the effect on the neural network during the modeling process. As a result, there are some redundant hidden layers in the MRELM network, which often require more reasonable selection methods and theories for the number of hidden layers. Meanwhile, the network structure of MRELM is determined by the users based on their own practical experience, but this empirical choice is not reasonable, and it is difficult to guarantee the optimality. In practical applications, users often need to carry out repeated experiments for many times and choose the network structure with the least time consuming and the highest accuracy from the complex results comparison as the optimal network model for the training and prediction of the actual data.

In order to realize the effective design of MRELM network structure, select the number of hidden layers reasonably, and achieve the desired accuracy requirements, an incremental MRELM training algorithm based on forced positive definite Cholesky factorization (FC-IMRELM) [23, 24] is put forward in this paper. The algorithm can adjust the number of hidden layers in the network adaptively according to the predicted data, so as to determine the optimal network structure of FC-IMRELM. At the same time, a novel method is adopted to calculate the parameters of the newly added hidden layers, that is, the connection weight matrix and the bias vector of the hidden layers. Based on previous research, MRELM typically requires fewer hidden neurons than ELM to achieve a desirable performance level. This is a basic requirement for considering the multiple-hidden-layers structure presented. The foundational ideas for the FC-IMRELM algorithm are far simpler to produce and more stable by comparing and contrasting its characteristics with other ELM variants. Experimental results for classification and regression problems show that the proposed FC-IMRELM algorithm has more advantages in terms of average accuracy compared to the traditional RELM model and other improved models of MRELM.

The rest of this paper is organized as follows: Section 2 presents a brief review of the basic concepts and related work of multiple hidden layers RELM, Section 3 describes the proposed incremental FC-IMRELM technique, Section 4 reports and analyzes the experimental results, and, finally, Section 5 summarizes key conclusions of the present study.

2. Brief Review of Multiple Hidden Layers Regularized Extreme Learning Machine

The MRELM algorithm tries to find a mapping relationship that makes the output predicted by the ELM neural network with multiple hidden layers infinitely close to the actual given result. This mapping relationship will be embodied in the solution process of the weight and bias parameters of the hidden layers. The number of hidden layers in the MRELM neural network needs to be selected according to the change of the predicted data. Therefore, in the training process of network parameters, in order to ensure that the final hidden layer output is closer to the expected hidden layer output, in addition to the random initialization of the parameters of the first hidden layer, the parameter training process starts from the second hidden layer to optimize the network parameters until all the network parameters are completed. Furthermore, during the establishment of the neural network, the weight matrix and bias vector of each hidden layer are acquired and recorded, so as to obtain the final predicted output result of the MRELM neural network. The solving process of the network parameters will be explained in detail in the following algorithm flow.

Suppose that a set of training sample dataset given in MRELM neural network is , where is the input samples, is the input vector, is the corresponding labeled samples, is the observation vector, and is the total number of training samples. Meanwhile, it is assumed that all hidden layers in the MRELM model contain the same number of hidden nodes , and each hidden node chooses the same activation function . In the modeling process of MRELM algorithm, multiple hidden layers in the neural network are first treated as a single hidden layer, and then the hidden layer parameters in the MRELM network containing only a single hidden layer are randomly initialized, namely, the input weights matrix connecting the input layer and the first hidden layer, and the bias vector of the first hidden nodes. Thus, the output matrix of the first hidden layer can be calculated as follows:whose scalar entries are interpreted as the output of the hidden node in the first hidden layer with respect to and is the vector of connection weights between input nodes and the hidden node in the first hidden layer. To better balance the empirical risk and structural risk, the MRELM adjusts the proportion of the two risks by introducing parameter , which can be expressed as the following constrained optimization problem.where is the connection weights matrix between the first hidden layer and the output layer, with vector components that denote the connection weights between the hidden node in the first hidden layer and output nodes, denotes the training error, and is the regularization parameter.

According on the KKT theorem, the constrained optimization of (2) can be transformed into the following dual optimization problem:where is the Lagrange multipliers vector. Utilizing KKT optimality conditions, the following equations can be obtained:Finally, can be gotten as follows:orIn order to reduce the computational costs, if , one may prefer to apply the solution (5a), and if , one may prefer to apply the solution (5b).

Now the second hidden layer is added to the MRELM neural network, the network structure with two hidden layers is restored, and the two hidden layers are fully connected, so the prediction output of the second hidden layer can be obtained as follows:where denotes the weights matrix between the first hidden layer and the second hidden layer. We suppose that the first and second hidden layers have the same number of nodes, and thus is a square matrix. The matrix represents the bias of the second hidden layer. The expected output of the second hidden layer can be calculated aswhere is the MP generalized inverse of the matrix , which can be calculated using the orthogonal projection method. Namely, if is nonsingular, then ; otherwise if is nonsingular. To make the predicted output of the hidden layer in the MRELM neural network infinitely close to the expected output, we may set . Subsequently, we define the augmented matrix , and it can be gotten aswhere is the MP generalized inverse of the matrix , and 1 represents a one-column vector of size N whose elements are the scalar unit 1. The solving method of is the same as previously discussed for . The notation indicates the inverse of the activation function . For classification and regression problems, we all invoke the widely used logistic sigmoid function . The predicted output of the second hidden layer is obtained as Therefore, the connection weights matrix between the second hidden layer and the output layer is calculated asorThe solving method of is chosen according to what is previously discussed for .

According to the MRELM algorithm flow, the third hidden layer is added to the MRELM network, and restore the network structure with three hidden layers. Since the nodes between each hidden layer are all connected together, the prediction output of the third hidden layer can be obtained as where represents the weights matrix between the second hidden layer and the third hidden layer, and the vector denotes the bias of the third hidden layer. Thus, the expected output of the third hidden layer can be gotten as where is the MP generalized inverse of the weights matrix , obtained using the approach described before. To meet the requirement that the predicted output of the third hidden layer is infinitely close to the expected output, let . Accordingly, the augmented matrix can be defined as , and we can solve it as follows.where is the MP generalized inverse of the matrix , the specific meaning of the symbol 1 is described above, and the calculation of also proceeds in the manner discussed before. Therefore, we can update the predicted output of the third hidden layer asFinally, the connection weight matrix between the third hidden layer and the output layer can be calculated asorThe calculation approach of is still selected according to the principle of discussed previously. The final output of the MRELM network with three hidden layers after training can be expressed as

If the number of hidden layers in the MRELM network is more than 3, an iterative format can be adopted to realize the calculation process. In other words, the iterative calculation of formula (6) to formulas (15a) and (15b) is performed for times until all hidden layer parameters are solved. Emphasized finally, this algorithm does not add all hidden layers to the network at one time, nor does it calculate all hidden layer parameters at one time, but one hidden layer after another is added to the network. Every time a new hidden layer is added, the weights matrix and the bias vector of the hidden layers are calculated immediately to prepare for the parameter calculation of the hidden layer to be added next time.

3. Solutions of IMRELM by the Forced Positive-Definite Cholesky Factorization

For the single hidden layer feed-forward neural network, the literature [15] puts forward the regularized extreme learning machine algorithm based on Cholesky factorization (CF-FORELM), introduces the Cholesky factorization of positive definite matrix into the solving process of RELM, and designs a recursive solution method for the calculation of the regularized output matrix Cholesky factorization factor. The advantages of CF-FORELM algorithm prompted us to introduce the forced positive-definite Cholesky factorization method into the framework of MRELM algorithm with multiple hidden layers, and we then proposed an MRELM neural network training algorithm based on forced positive definite Cholesky factorization (FC-IMRELM). Compared with the inverse matrix calculation of invertible matrix in traditional RELM algorithm and the calculation of MP generalized inverse of matrix in MELM algorithm ( denotes the number of hidden layers), the algorithm effectively reduces the computational cost and complexity brought by the matrix inverse process. Meanwhile, the numerical stability of the forced positive definite Cholesky factorization method also greatly weakens the randomness effect of the ELM algorithm on the prediction results.

3.1. Forced Positive-Definite Cholesky Factorization (FC)

The main difficulty of the MRELM algorithm is the calculation of the inverse matrix and the MP generalized inverse matrix involved in the training process, including the inverse calculation of symmetric positive semidefinite matrix. In this case, the improved MRELM mode based on the traditional Cholesky factorization could not be realized, because the Cholesky factorization of the symmetric positive semidefinite matrix might not exist. Even if such a factorization exists, the calculation process is generally numerically unstable for the elements of the matrix factorization factor may be unbounded. In order to overcome these difficulties, we put forward a modified approach based on the forced positive definite Cholesky factorization for the MRELM algorithm with multiple hidden layers, which is a numerical stability approach.

When the forced positive definite strategy is adopted to improve the MRELM algorithm, the key problem is how to form the positive definite matrix from the modified Cholesky decomposition of the undetermined matrix. If the matrix is not a positive definite matrix, the Cholesky factorization method, which forces the matrix to have positive definite property, is to find a unit lower triangular matrix and a positive definite diagonal matrix for the general symmetric matrix , so that the matrix is positively definite, and it is only one diagonal matrix away from the matrix .In fact, the Cholesky factorization of symmetric positive definite matrix can be described as follows:where represents the element of matrix and denotes the main diagonal element of matrix . Here, the Cholesky factorization factors and are required to satisfy two requirements: one is that all elements of are strictly positive, and the other is that the elements of the factorization factor are uniformly bounded. That is, for and a positive number , the formula (19) is required:where the auxiliary quantity , is a given small positive number. The matrix satisfying the above conditions is said to be sufficiently positive definite, where is a zero matrix.

Next, we describe the step of this factorization. Suppose the column of the forced positive-definite Cholesky factorization has been calculated. For , equation (19) holds. First calculatewhere is taken as , and the test value is defined aswhere is a small positive number. In order to determine whether can accept as the element of , we check whether satisfies the formula (19). If so, let , and get the column of from . Otherwise,let , select positive number to make , and produce the column of .

If the above process is completed, we obtain the Cholesky factorization formula (17) of the positive definite matrix , where is a nonnegative diagonal matrix and the diagonal element is . For the given matrix , this nonnegative diagonal matrix depends on . If , where is the maximum norm of the nondiagonal elements of and is the maximum norm of the diagonal elements of . If , the upper bound is minimized. So, let satisfy formula (24):where represents the machine precision. We increase to prevent from being small.

Finally, we present the forced positive-definite Cholesky factorization algorithm, where the auxiliary quantity , , . These values need not be stored separately; they can be stored in the matrix .

Algorithm 1. Forced Positive-Definite Cholesky Factorization (FC)
tep 1. Calculate the bounds of the elements of the factorization factor. Let , where and and are the maximum norm of the diagonal and nondiagonal elements of , respectively.
tep 2. Initialization. Let , , .
tep 3. Determine the minimum index , so that , and exchange the information of rows and rows, columns and columns of .
tep 4. Calculate the row of , and solve the maximum norm of . Let , , calculate , , and let . If , let .
tep 5. Calculate the diagonal element of . The diagonal element of is modified to . If , stop.
tep 6. Correct the diagonal elements and column index, and let , and ; jump to tep 3.

3.2. Process of Matrices Decomposition for FC-IMRELM

According to the MRELM training process shown in equations (1) to (16), its essence is to solve the connection weight matrix between the hidden layer and the output layer. However, it can be seen from equations (5a), (5b), (10a), (10b), (15a), and (15b) that the solution method given in literature [18] involves matrix inversion, and the solution process of each hidden layer's parameters ( is the number of hidden layers) involves the calculation of the MP generalized inverse of matrix. The problem of large amount of calculation reduces the modeling efficiency of the MRELM prediction model. In order to solve the above problems effectively, we propose a solution method of the weights matrix and hidden layer parameters based on the forced positive definite Cholesky factorization.

First, on the basis of equations (4a), (4b), and (4c), can be obtained from equation (4a), and can be obtained from equation (4b), and then substituting and into equation (4c), we can obtainwhere , , and , andwhere , , and .

Therefore, the process of solving according to equation (5a) can be transformed into solving linear equations in the form of equation (25), and the process of solving according to equation (5b) can be transformed into solving linear equations in the form of equation (26); is the dimension of observation vector.

If the number of training samples is greater than the number of hidden nodes, that is, , the solution process of based on the Cholesky factorization is as follows. First, calculate the Cholesky factorization of matrix :where is a lower triangular matrix with positive diagonal elements. The nonzero element in can be calculated by the element of according to equation (28):where , . Substitute equation (27) into equation (25) and multiply both sides of the equation by : where . Solving is equivalent to solving equation (29). Because is equivalent to , the calculation formula for the elements of can be obtained by comparing the elements on both sides of the equation where , is the element of . Finally, on the basis of obtaining and , the element of can be calculated by using the elements of and : where , , .

If the number of hidden nodes is greater than the number of training samples, that is, , the solution process of based on the Cholesky factorization is as follows. First, calculate the Cholesky factorization of :where is a lower triangular matrix with positive diagonal elements. The nonzero element in can be calculated by the element of according to equation (33): where , . Substitute equation (32) into equation (26) and multiply both sides of the equation by : where . Solving is equivalent to solving equation (34). Because is equivalent to , the calculation formula for the element of can be obtained by comparing the elements on both sides of the equation where , is the element of . Finally, on the basis of obtaining and , the element of can be calculated by using the elements of and . where , , .

At this point, we get the connection weight matrix between the first hidden layer and the output layer. The connection weight matrix between the other hidden layer and output layer can also be calculated by the above method. Compared with the solution method of the connecting weight matrix as shown in equations (5a), (5b), (10a), (10b), (15a), and (15b), the solution of based on Cholesky factorization does not involve the inverse operation of the matrix, and it can be achieved by using simple algebraic operations.

The MRELM model contains multiple hidden layers, and the solving process of each hidden layer parameter needs to calculate the MP generalized inverse of the corresponding matrices. However, it can be concluded from equations (7), (8), (11), and (13) that the solution method of and using orthogonal projection method [5] involves the inverse calculation of symmetric semipositive definite matrixes , , , and , which has the problem of large computational cost and numerical instability, and if the condition number of the above matrices is too large, the calculation results of MP generalized inverse of matrices and are usually unable to be obtained. This not only affects the modeling efficiency and prediction effect of MRELM model, but also may make the modeling process impossible to complete. However, the traditional Cholesky factorization method can only be used to solve the calculation of symmetric positive definite matrix. To effectively overcome the above difficulties, we use the forced positive definite Cholesky factorization to solve the MP generalized inverse of matrices and , and we then get the hidden layer parameters .

If is nonsingular, then take and substitute it into (7) and (11): where , .

If is nonsingular, then take and substitute it into (7): where , .

Therefore, the process of solving can be transformed into solving linear equations in the form of equation (37), or solving linear equations in the form of equation (38), is the number of hidden nodes, and is the dimension of the observation vector.

If is nonsingular, the solution process of based on forced positive definite Cholesky factorization is as follows. First, calculate the modified Cholesky factorization result of matrix :where is the unit lower triangular matrix and is the positive definite diagonal matrix. is the nonzero element below the main diagonal in , and is the main diagonal element in ; it can be calculated according to Algorithm 1 by using element of , where , . If we set , then (39a) can be written aswhere . Substitute equation (39b) into equation (37) and multiply both sides of the equation by . where . Solving is equivalent to solving equation (40). Since is equivalent to , by comparing the elements on both sides of the equation, the calculation formula for the element of can be obtained: where is the element of , and is the element of , , . Finally, on the basis of obtaining and , the element of can be calculated by using the elements of and . where , , ; consequently, we can get .

If is nonsingular, the solution process of based on forced positive definite Cholesky factorization is as follows. First, calculate the modified Cholesky factorization results of matrix : where is the unit lower triangular matrix and is the positive definite diagonal matrix. is the nonzero element below the main diagonal in , and is the main diagonal element in ; it can be calculated according to Algorithm 1 by using the element of , , . If we set , then (43a) can be written aswhere . Substitute equation (43b) into equation (38) and multiply both sides of the equation by . where . Solving = is equivalent to solving equation (44). Since is equivalent to , by comparing the elements on both sides of the equation, the calculation formula for the element of can be gotten as where is the element of and is the element of , , . Finally, on the basis of obtaining and , the element of can be obtained by using elements of and . where , , ; therefore, we can get .

If is nonsingular, then let and substitute it into (8) and (13): where , .

If is nonsingular, then let and substitute it into (8) and (13): where = , , , = , .

Therefore, the process of solving can be transformed into solving linear equations in the form of equation (47), or solving linear equations in the form of equation (48), where is the number of hidden nodes.

If is nonsingular, the solution process of based on the forced positive definite Cholesky factorization is as follows. First, calculate modified Cholesky factorization results of matrix : where is the unit lower triangular matrix and is the positive definite diagonal matrix. is the nonzero element below the main diagonal in , and is the main diagonal element in ; it can be calculated according to Algorithm 1 by using element of , , . If we set , then (49a) can be written aswhere . Substitute equation (49b) into equation (47) and multiply both sides of the equation by . where . Solving is equivalent to solving equation (46). For is equivalent to , by comparing the elements on both sides of the equation, the calculation formula for the element of can be calculated as where is the element of and is the element of , , . Finally, on the basis of obtaining and , the element of can be calculated by using the elements of and . where , , ; accordingly, we can obtain the matrix .

If is nonsingular, the solution process of based on the forced positive definite Cholesky factorization is as follows. First, calculate modified Cholesky factorization results of matrix :where is the unit lower triangular matrix and is the positive definite diagonal matrix. is the nonzero element below the main diagonal in , and is the main diagonal element in ; it can be calculated according to Algorithm 1 by using element of , , . If we set , then (53a) can be written aswhere . Substitute equation (53b) into equation (48) and multiply both sides of the equation by . where . Because solving is equivalent to solving equation (54) and is equivalent to , by comparing the elements on both sides of the equation, the calculation formula for the element of can be calculated as where is the element of and is the element of , , . Finally, on the basis of obtaining and , the element of can be calculated by using the elements of and . where , . So we can get .

So far we have obtained the parameters of each hidden layer. Compared with the solving method of hidden layer parameters shown in equations (7), (8), (11), and (13), the calculation approach of based on the forced positive definite Cholesky factorization does not involve the operation of inverse matrix, which can be realized only by simple algebraic calculations. Meanwhile, the method of forcing the matrix to be positive definite matrix can guarantee the numerical stability of MRELM neural network training process.

3.3. FC-IMRELM Training Algorithm

Studies have shown that the number of hidden layers determines the learning accuracy and generalization ability of the MRELM model [25], and it is also a key factor that must be determined in advance when designing the MRELM network structure. Due to the complexity of various training samples applied in MRELM prediction model, it is difficult to accurately determine the optimal number of hidden layers by human experience, so that the MRELM prediction model has enough hidden layers to ensure its learning accuracy, while, at the same time, it has as few hidden layers as possible to maintain its contracted network structure. To avoid the disadvantages and difficulties of artificially selecting the number of hidden layers, we propose an incremental MRELM training algorithm based on forced positive Cholesky factorization, which can automatically determine the optimal number of hidden layers in MRELM, and the training process is as follows.

Algorithm 2. Incremental MRELM based on Forced Positive-Definite Cholesky Factorization (FC-IMRELM)
tep 1. Suppose the training sample dataset is , where is the input samples, is the labeled samples, each hidden layer contains the number of hidden nodes , and the activation function is .
tep 2. Let the number of hidden layers in MRELM , randomly initialize the input weights matrix between the input layer and the first hidden layer and the bias vector of the first hidden layer, and let , .
tep 3. Calculate the output matrix of the first hidden layer , and calculate the matrices , , and .
tep 4. Calculate the connection weights matrix between the first hidden layer and the output layer or .
(1) If , the Cholesky factorization factor of matrix is calculated according to the formula (28), and is calculated according to the formula (30) using matrices and . Then matrices and are used to calculate according to equation (31), and we can get .
(2) If , the Cholesky factorization factor of matrix is calculated according to the formula (33), and is calculated according to the formula (35) using and . Then the matrices and are used to calculate according to the equation (36); subsequently, we can obtain .
tep 5. Calculate the expected output of the hidden layer.
(1) If is nonsingular, the forced positive-definite Cholesky factorization factor of matrix is calculated according to equations (39a) and (39b), and is calculated according to equation (41) using and . Then the matrices and are used to calculate according to formula (42), so we can obtain .
(2) If is nonsingular, the forced positive-definite Cholesky factorization factor of is calculated according to equations (43a) and (43b), and is calculated according to equation (45) using and . Then the matrices and are used to calculate according to formula (46); thus we can get .
tep 6. Let , calculate the parameter of the hidden layer, set , is the weights matrix between the hidden layer and the hidden layer, and is the bias vector of the hidden layer.
(1) If is nonsingular, the forced positive-definite Cholesky factorization factor of matrix is calculated according to equations (49a) and (49b), and is calculated according to equation (51) using and . Then the matrices and are used to calculate according to formula (52); therefore we can obtain .
(2) If is nonsingular, the forced positive-definite Cholesky factorization factor of matrix is calculated according to equations (53a) and (53b), and is calculated according to equation (55) using and . Then the matrices and are used to calculate according to formula (56); thereby we can get .
tep 7. Calculate the predicted output matrix of the hidden layer according to the equation .
tep 8. Recalculate the connection weight matrix or between the hidden layer and the output layer according to the principle shown in (28)-(36).
tep 9. On the basis of , the MRELM prediction model with hidden layers is established, and the final output result is calculated.
tep 10. Calculate the sum of the empirical risk and the structural risk of the MRELM prediction model. Then jump to tep 5. Judging the formula (58) from ,where is the learning precision and is the maximum value of , . If the formula (58) is satisfied, the training process terminates, determine as the optimal number of hidden layers, and establish the corresponding MRELM prediction model; otherwise continue to increase until the condition is met.

The number of hidden layers in MRELM increases successively from the initial value, and the expansion stops when is no longer significantly reduced. At this time, even if the hidden layer is continued to be added, the representing the learning accuracy and generalization ability of MRELM model will not be significantly improved but will lead to a large number of redundant hidden layers in MRELM. Therefore, the MRELM model at this time has the optimal number of hidden layers.

3.4. Proof of Positive Definiteness for FC-IMRELM

In the implementation of FC-IMRELM algorithm, the process of solving the connection weights matrix and the learning parameters in the hidden layer can be transformed into solving linear equations in the form of equations (25), (26), (37), (38), (47), and (48). The premise of applying standard Cholesky factorization to solving linear equations is that its coefficient matrix must be a symmetric positive definite matrix, and thus we need to prove that the matrices and are symmetric positive definite matrices. The precondition of applying the forced positive definite Cholesky factorization to solving linear equations is that the coefficient matrix must be a symmetric matrix; therefore it is necessary to prove that the matrices , , , and are symmetric semipositive definite matrices. The following theorem shows that the matrices and are symmetric semipositive definite matrices, and is symmetric positive definite matrix; hence the matrices and are symmetric positive definite matrices, too. Consequently, Eq. (25) and Eq. (26) can be solved by the standard Cholesky factorization, and the connection weights matrix in the hidden layer can be gotten. In addition, the matrices , , , and are symmetric semipositive definite matrices, so Eq. (37), Eq. (38), Eq. (47), and Eq. (48) can be calculated by force positive-definite Cholesky factorization, and the learning parameters in the hidden layer can be obtained.

Theorem 3. Let be a matrix. Then and are all symmetric positive definite matrices.

Proof. Obviously, for the matrix , then by properties of transpose we have So, is symmetric.
Similarly, for the matrix , applying properties of transpose again, we get Hence, is symmetric.
On the other hand, let be a nonzero vector in . We can easily verify that In addition,Thus, the matrix is positive definite
Also, let be a nonzero vector in . Then we deduce that Additionally,Consequently, the matrix is positive definite.

Theorem 4. Let be a matrix. Then and are all symmetric positive semidefinite matrices.

Proof. First, for the matrix , it is not hard to see from properties of transpose that Thus, is symmetric.
Second, by the same method we have for Therefore, is symmetric.
Finally, it is clear that is a square symmetric matrix. For any nonzero vector in , we haveThus, the matrix is positive semidefinite.
It is also clear that is a square symmetric matrix. For any nonzero vector in , we getConsequently, the matrix is positive semidefinite.

Theorem 5. Let be a matrix. Then and are all symmetric positive semidefinite matrices.

Proof (apparent). We can prove Theorem 5 using the same idea as in Theorem 4.

4. Results and Discussion

In this section, experiments of our proposed FC-IMRELM are conducted on benchmark data sets for classification and regression problems. In order to investigate the improvement of learning accuracy of our methods, original ELM [5], TELM [17], and MRELM [18] are also evaluated. All the performance assessments are conducted in the MATLAB 2014b computational environment running on Windows 10 operating system with Intel Core™ i5-7200U CPU @2.7 GHz, 8 GB RAM, NVIDIA M150 Graphics card, and 2 GB GDDR5 video memory. Furthermore, to comprehensively compare resulting performances, the activation function of each algorithm used in the experiments is uniformly assigned as sigmoid function: , where is 1. The number of hidden neurons is set to 20. Moreover, each algorithm runs for 100 trials and the results obtained will be averaged as the final value.

4.1. Classification Problems

4.1.1. Characteristics of Classification Datasets

The classification datasets are obtained from the UCI website [26] and the literature [27, 28]. To evaluate the robustness of our FC-IMRELM algorithm, we conducted the tests using simple benchmark datasets and real datasets collected from coal and iron ores industries. The characteristics of the datasets are shown in Table 1.

4.1.2. Evaluation of Testing Accuracy on Classification Datasets

In order to make the performance evaluation more comprehensive, the real datasets that are related to complex industrial data were added to our performance evaluation. The original ELM, TELM, and MRELM algorithms are tested using the simple benchmark datasets and real datasets to validate the improvement of learning accuracy of our IMRELM algorithm. From Table 2 and Figure 1, we can see that each algorithm has a good classification accuracy for Banknote dataset. For Blood, Diabetic, Wilt, Coal spectral, and Iron spectral datasets, the algorithms TELM, MRELM, and FC-IMRELM all outperform the ELM algorithm, and the FC-IMRELM algorithm has the highest classification accuracy. For Image dataset, the classification accuracy of the algorithms TELM, MRELM, and FC-IMRELM is much higher than that of the ELM algorithm, and the classification accuracy of the FC-IMRELM algorithm is still the highest, reaching 91.98%. The experimental results show that the average classification accuracy of our FC-IMRELM algorithm is significantly higher than that of the original ELM, TELM, and MRELM algorithms, and the computational experiments using Coal spectral and Iron spectral datasets also demonstrate that our FC-IMRELM algorithm can be easily extended to practical application.

4.2. Regression Problems

For the regression model, the root-mean-square error (RMSE) and the coefficient of determination (R2) [29] are used as the model performance evaluation indexes in this study to verify the effectiveness of the proposed FC-IMRELM algorithm. R2 and RMSE are expressed as follows:where is the number of samples in the prediction set; is the actual value of the sample; is the average of the actual values; is the predicted value calculated by the model. The value range of R2 is between (0, 1), the closer the value of R2 to 1 and the smaller the value of RMSE, the better the performance of the model.

4.2.1. Characteristics of Regression Datasets

The regression datasets are obtained from the LIBSVM website [30]. The characteristics of the datasets are shown in Table 3.

4.2.2. Evaluation of Estimation Accuracy on Regression Datasets

As shown in Table 4 and Figure 2, we can conclude that the algorithms TELM, MRELM, and FC-IMRELM all have good prediction results for the Bodyfat dataset, where RMSE of these algorithms are small, and R2 are above 0.98, while the prediction ability of the ELM algorithm is slightly worse, and R2 of ELM is equal to 0.8. For Pyrim and Triazines datasets, the prediction results utilizing the FC-IMRELM algorithm are better than those utilizing the algorithms ELM, TELM, and MRELM, among which RMSE of the FC-IMRELM algorithm is the smallest, and R2 is the highest. The above analysis of experimental results indicates that the advantage of the FC-IMRELM algorithm is that it can better extract data characteristics for the multiattribute data, so it has better predictive ability.

(a)

(b)

5. Conclusions

(1) First of all, compared with MRELM, the FC-IMRELM algorithm proposed in this paper uses the idea of forced positive definite Cholesky factorization to determine the hidden layer parameters. The training process is more simplified, with low calculation amount and high numerical stability. In addition, the MRELM algorithm needs to set the network structure in advance, and the number of hidden layers remains unchanged in the training process, while the FC-IMRELM algorithm can automatically select the optimal number of hidden layers through the principle of structural risk and empirical risk minimization and adjust the network structure adaptively according to the training samples.

(2) Secondly, compared with CF-FORELM, the FC-IMRELM algorithm proposed in this paper is designed for the semipositive definite matrices appearing in the parameter solving process of MRELM model. The condition number of the matrix is improved while forcing the matrix positive definite, thereby accelerating the convergence speed of the MRELM model and ensuring the numerical stability of the modeling process.

(3) Finally, by introducing parameter to weigh the structural risk and empirical risk of ELM model, the FC-IMRELM algorithm has significantly improved its generalization ability compared with the traditional neural network. In addition, the forced positive definite Cholesky factorization is used to calculate its output weights, effectively reducing the computational cost brought by the increasing process of hidden layers. The prediction example shows that the FC-IMRELM algorithm can effectively avoid the numerical instability of MRELM model and has the advantages of high prediction accuracy and fast calculation speed, which can provide a novel and efficient solution to the prediction problem.

Data Availability

The datasets are obtained from [26–28, 30]. All data is available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFB0304100, in part by the National Natural Science Foundation of China under Grant 71672032, and in part by the Fundamental Research Funds for Central University under Grant N180404012 and Grant N182608003.

References

C.-M. Chang, T.-K. Lin, and C.-W. Chang, “Applications of neural network models for structural health monitoring based on derived modal properties,” Measurement, vol. 129, pp. 457–470, 2018.
View at: Publisher Site | Google Scholar
H. Kim, C. Sui, K. Cai, B. Sen, and J. Fan, “An efficient high-speed channel modeling method based on optimized design-of-experiment (DoE) for artificial neural network training,” IEEE Transactions on Electromagnetic Compatibility, vol. 60, no. 6, pp. 1648–1654, 2018.
View at: Publisher Site | Google Scholar
Z.-Y. Wang, C. Lu, and B. Zhou, “Fault diagnosis for rotary machinery with selective ensemble neural networks,” Mechanical Systems and Signal Processing, vol. 113, pp. 112–130, 2018.
View at: Publisher Site | Google Scholar
G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: a new learning scheme of feedforward neural networks,” in Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990, July 2004.
View at: Publisher Site | Google Scholar
G. B. Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006.
View at: Publisher Site | Google Scholar
M. Sahani and P. K. Dash, “Variational mode decomposition and weighted online sequential extreme learning machine for power quality event patterns recognition,” Neurocomputing, vol. 310, pp. 10–27, 2018.
View at: Publisher Site | Google Scholar
L. Yang and S. Zhang, “A sparse extreme learning machine framework by continuous optimization algorithms and its application in pattern recognition,” Engineering Applications of Artificial Intelligence, vol. 53, pp. 176–189, 2016.
View at: Publisher Site | Google Scholar
W. Zheng, X. Peng, D. Lu et al., “Composite quantile regression extreme learning machine with feature selection for short-term wind speed forecasting: a new approach,” Energy Conversion and Management, vol. 151, pp. 737–752, 2017.
View at: Publisher Site | Google Scholar
Y. Chen and W. Wu, “Mapping mineral prospectivity using an extreme learning machine regression,” Ore Geology Reviews, vol. 80, pp. 200–213, 2017.
View at: Publisher Site | Google Scholar
S. Yuong Wong, K. Siah Yap, and H. Jen Yap, “A Constrained Optimization based Extreme Learning Machine for noisy data regression,” Neurocomputing, vol. 171, pp. 1431–1443, 2016.
View at: Publisher Site | Google Scholar
W. Y. Deng, Q. H. Zheng, and L. Chen, “Regularized extreme learning machine,” in Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM '09), pp. 389–395, April 2009.
View at: Publisher Site | Google Scholar
G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning machine for regression and multiclass classification,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 2, pp. 513–529, 2012.
View at: Publisher Site | Google Scholar
S. Ding, G. Ma, and Z. Shi, “A rough RBF neural network based on weighted regularized extreme learning machine,” Neural Processing Letters, vol. 40, no. 3, pp. 245–260, 2014.
View at: Publisher Site | Google Scholar
J. M. Martínez-Martínez, P. Escandell-Montero, E. Soria-Olivas, J. D. Martín-Guerrero, R. Magdalena-Benedito, and J. Gómez-Sanchis, “Regularized extreme learning machine for egression problems,” Neurocomputing, vol. 74, no. 17, pp. 3716–3721, 2011.
View at: Publisher Site | Google Scholar
W. Zheng, Y. Qian, and H. Lu, “Text categorization based on regularization extreme learning machine,” Neural Computing and Applications, vol. 22, no. 3-4, pp. 447–456, 2013.
View at: Publisher Site | Google Scholar
X.-R. Zhou and C.-S. Wang, “Cholesky factorization based online regularized and kernelized extreme learning machines with forgetting mechanism,” Neurocomputing, vol. 174, pp. 1147–1155, 2016.
View at: Publisher Site | Google Scholar
B. Qu, B. Lang, J. Liang, A. Qin, and O. Crisalle, “Two-hidden-layer extreme learning machine for regression and classification,” Neurocomputing, vol. 175, pp. 826–834, 2016.
View at: Publisher Site | Google Scholar
D. Xiao, B. Li, and Y. C. Mao, “A multiple hidden layers extreme learning machine method and its application,” Mathematical Problems in Engineering, vol. 2017, Article ID 4670187, 10 pages, 2017.
View at: Publisher Site | Google Scholar
X. Wen, H. Liu, G. Yan, and F. Sun, “Weakly paired multimodal fusion using multilayer extreme learning machine,” Soft Computing, vol. 22, no. 11, pp. 3533–3544, 2018.
View at: Publisher Site | Google Scholar
X. Su, S. Zhang, Y. Yin, Y. Liu, and W. Xiao, “Data-driven prediction model for adjusting burden distribution matrix of blast furnace based on improved multilayer extreme learning machine,” Soft Computing, vol. 22, no. 11, pp. 3575–3589, 2018.
View at: Publisher Site | Google Scholar
X. Li, W. Mao, and W. Jiang, “Multiple-kernel-learning-based extreme learning machine for classification design,” Neural Computing and Applications, vol. 27, no. 1, pp. 175–184, 2016.
View at: Publisher Site | Google Scholar
Y. Yang, Q. M. J. Wu, Y. Wang, K. M. Zeeshan, X. Lin, and X. Yuan, “Data partition learning with multiple extreme learning machines,” IEEE Transactions on Cybernetics, vol. 45, no. 8, pp. 1463–1475, 2015.
View at: Publisher Site | Google Scholar
P. E. Gill and W. Murray, “Newton-type methods for unconstrained and linearly constrained optimization,” Mathematical Programming, vol. 7, pp. 311–350, 1974.
View at: Publisher Site | Google Scholar | MathSciNet
P. E. Gill and W. Murray, “Quasi-Newton methods for unconstrained optimization,” Journal of the Institute of Mathematics and Its Applications, vol. 9, pp. 91–108, 1972.
View at: Publisher Site | Google Scholar | MathSciNet
C. M. Wong, C. M. Vong, P. K. Wong, and J. Cao, “Kernel-based multilayer extreme learning machines for representation learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 3, pp. 757–762, 2018.
View at: Publisher Site | Google Scholar | MathSciNet
http://archive.ics.uci.edu/ml/datasets.html.
Y. Mao, B. T. Le, D. Xiao et al., “Coal classification method based on visible-infrared spectroscopy and an improved multilayer extreme learning machin,” Optics and Laser Technology, vol. 114, pp. 10–15, 2019.
View at: Google Scholar
D. Xiao, C. Liu, and B. T. Le, “Detection method of TFe content of iron ore based on visible-infrared spectroscopy and IPSO-TELM neural network,” Infrared Physics and Technology, vol. 97, pp. 341–348, 2019.
View at: Publisher Site | Google Scholar
B. T. Le, D. Xiao, Y. Mao, and D. He, “Coal analysis based on visible-infrared spectroscopy and a deep neural network,” Infrared Physics & Technology, vol. 93, pp. 34–40, 2018.
View at: Publisher Site | Google Scholar
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets.

Copyright

Copyright © 2019 Jingyi Liu and Ba Tuan Le. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

620

Downloads

757

Citations