Abstract

The method that the real-coded quantum-inspired genetic algorithm (RQGA) used to optimize the weights and threshold of BP neural network is proposed to overcome the defect that the gradient descent method makes the algorithm easily fall into local optimal value in the learning process. Quantum genetic algorithm (QGA) is with good directional global optimization ability, but the conventional QGA is based on binary coding; the speed of calculation is reduced by the coding and decoding processes. So, RQGA is introduced to explore the search space, and the improved varied learning rate is adopted to train the BP neural network. Simulation test shows that the proposed algorithm is effective to rapidly converge to the solution conformed to constraint conditions.

1. Introduction

Artificial neural networks (ANNs) are put forward to solve the nonlinear problem by simulating the operational process of nervous system. ANNs are powerful tools for prediction of nonlinearities [1] and with excellent nonlinear mapping ability, generalization, self-organization, and self-learning. ANNs have been widely applied in engineering and are steadily advancing into new areas [2].

The “feed-forward, back propagation” neural network (BPNN) is currently the most popular network architecture in use [3]. BPNN can be applied in a variety of fields according to the characteristics of the model. Sun et al. establish a prediction model based on improved BP neural network and adopt it to investigate quantitative evolution laws of equiaxed α in near-β forging of TA15 Ti-alloy [4]. Xiao et al. propose an approach of back propagation neural network with rough set for complicated short-term load forecasting with dynamic and nonlinear factors to develop the accuracy of predictions [5]. Wang et al. apply improved variable learning rate back propagation (IVL-BP) to short-term power load forecasting [6]. Yu et al. propose a dynamic all parameters adaptive BP neural networks model through fusion of genetic algorithm, simulated annealing algorithm, and the BP neural network and apply it to oil reservoir prediction [2].

Although system based on BPNN is with good performance, the BPNN lacks stability in some cases. The main faults of BPNN include the following. (1) Fixed learning rate leads to the slow convergence speed of network and long training time. (2) The gradient descent algorithm used to optimize the objective function in BPNN makes the computation overflow or swing between optimal values and does not make it converge to global optimal value [7]. (3) Structure and scale have much influence on the performance of BPNN. Different nodes in hidden layer and conversion functions acting on the same data will cause different results. (4) The convergence of BPNN is influenced by the choice of initial weights. Improper selection of the initial set of weights will make the algorithm trapped in local optimal solution. (5) The adjustment of weight and threshold values follows certain rule. It is impossible to adaptively adjust the structure in a fixed layout. (6) The learning algorithm of BPNN is based on back propagation of error, which leads to the slow learning convergence and easily falls into local minimum.

To overcome the abovementioned problems, many scholars have put forward some improved algorithm. Although the disadvantages are not ultimately overcome, the study level of BPNN is gradually increased. Different methods can be classified as follows.

(1) The Optimization of Network Structure. Experiential or statistical method is applied to determine the structure of BPNN, which is the optimal combination of the number of hidden layers, the number of hidden neurons, the choice of input factors, and parameters of the training algorithm [8]. The most systematic and general method is to utilize the principle of design experiment of Taguchi [9]. Use grey correlation analysis to determine the number of hidden nodes of the optimal network to improve the performance of the network [10]. Apply the network growth/removal algorithm to add/remove neurons from the initial structure according to the predetermined standard to represent the effect of changes on the performance of the ANNs. The basic rule is to increase neurons when the training process is slow or the mean square deviation is greater than the specified value. Decrease the number of neurons when the change of the number of neurons cannot change the response of the network accordingly or neuron weights remain unchanged for a long time. Growth/removal algorithm is the basic gradient descent method which cannot guarantee convergence to the global minimum; therefore, the algorithm may fall into local optimal solution near to the initial point. The structure can be changed by the application of genetic operators and evaluation of object function [11].

(2) The Improved Training Method of BPNN. Some new methods can be introduced to train the neural network; for example, the Online Gradient Method with changing scale can be used to train BPNN to achieve better convergence [12]. The real-coded chaotic quantum genetic algorithm is applied for training the fuzzy neural network to accelerate the convergence speed [13]. The transfer function, parameter, the accuracy of the assessment, and gradient descent coefficient can be improved properly in the process of training.

(3) The Combination of BPNN with Other Optimization Algorithms to Optimize the Weights and Thresholds. Zhuo et al. propose a simulated annealing- (SA-) genetic algorithm-BPNN-based color correction algorithm for traditional Chinese medicine tongue images [14]. Liu et al. apply GA-BP to predict bitterness intensity [15]. The GA-BP is compared with multiple linear regression, partial derivative least square regression, and BP method to prove the superiority of BP-GA model. Wang et al. improve BPNN by introducing cuckoo search algorithm to forecast lightning occurrence from sounding-derived indices over Nanjing [16].

Through the analysis, we can see that most optimization methods of the weights and thresholds of BPNN are based on the combination with GA or SA [14] or the combination of the three kinds of algorithms. There are some major disadvantages of GA and SA. When solving some complex optimization problems, the convergence speed of GA is slow, the convergence may be premature stagnation, and the numbers of species and individuals are large. SA utilizes the principle of crystallization process to minimum energy of metal to search for the minimum in general system. SA is first proposed in the literature of Kirkpatrick to be used to find the balanced combination of atoms set at a given temperature [17]. Compared with other methods, the main advantage of the SA is the ability to avoid falling into local optimal value. SA is a random search algorithm, and better values and worse values can be obtained at a certain probability. However, the calculation amount of SA is large, especially for complex problems.

RQGA is a global optimization algorithm which can find the global optimal solution in the complex, multiple-extreme, and nondifferentiable vector space when the number of parameters is small. RQGA is with fast convergence speed and strong optimization ability and does not easily converge to the local optimal solution. Introducing RQGA to optimize the weights and thresholds of BPNN can guarantee getting better solution with higher probability.

2. BP Neural Network

BPNN is a kind of multilayer feed-forward network according to training of the error back propagation algorithm proposed by Rumelhart and McCelland in 1986. The mean of “back propagation” is that the adjustment way of the weights of network is back propagation of error. As a result of the simple structure of BPNN, many adjustable parameters, and good maneuverability, the BPNN is one of the most widely used artificial neural network algorithms.

BPNN is a kind of typical forward network [18]. The training function is used reversely to change weights and thresholds through the positive transfer method of network structure. The samples to be measured are handled by the training model after training structure model of sample is established. The following formula is operating formula of BP where is the input matrix, is the weight matrix, and is the threshold matrix:

The specific process of BPNN is as shown in Figure 1.

(1) Initialization: determine weights , and thresholds , randomly.

(2) Select a group of input and output data , . Calculate the input of hidden layer with the input sample , weight value , and threshold . Then, calculate the output of hidden layer with the translation function , where , , , . Calculate the output of the output layer with , , and . Then calculate the response output of output layer with by the translation function. Consider , , and , .

(3) Calculate the total error of every output neuron with the objective matrix of network and corresponding response output . Consider , . Then calculate the total error with , , and . Consider .

(4) If the error in (3) is smaller than the default error, end the training process; if not, correct and with and and correct and with and at the same time.

(5) Select another group of input and output data randomly from the sample and return to step (2). Continue this process until the end condition is satisfied.

The learning rate of the conventional BP network is invariable. If the learning rate is too small, though the convergence can be guaranteed, the learning speed is slow; if the learning rate is large, it is easy to cause large fluctuation or deviation from the optimal solution. So the learning rate needs to be adjusted in the process of training.

The basic idea of variable learning rate is that if the average variance increases and exceeds the preset value after weights update, decrease the learning rate; if the average variance is less than the preset value, the learning rate stays unchanged; if the average variance is reduced, increase the learning rate.

3. Real-Coded Quantum-Inspired Genetic Algorithm

Conventional QGA is based on binary coding and can be used to solve the problem of combinatorial optimization well, such as the traveling salesman problem [19], knapsack problem [20, 21], and the filter design [22]. Using binary numbers to represent the parameters forces a trade-off between accuracy of representation and string lengths. RQGA is better in optimizing the real-valued problems with multiple-extremum. Optimization of weights and thresholds of BPNN is a typical optimization problem of real number. When the real number problem needs to be optimized, real encoding is considered better than binary and gray coding in solving the multiparameter optimization problem [23, 24]. RQGA has the inherent advantage of QGA. The search performance and quantum operators make RQGA have the characteristics of effectiveness, flexibility, robustness, and so on. The RQGA can solve the problem with real parameters well, so the RQGA is applied to optimize the BPNN.

3.1. Coding Method of RQGA

Initial chromosome includes string of real value and string of quantum bit value and is expressed as    is the value of real number coding, and    denotes phase angle of quantum bit. So each chromosome contains information of real number space and phase space at the same time.

The characteristics of RQGA are as follows. (1) Quantum bit coding makes population obtain better diversity to reduce the calculation. (2) RQGA utilizes special quantum evolutionary operators to generate candidate solutions containing real parameters, which is different with the candidate solutions generated by quantum observation in QGA. (3) RQGA applies quantum rotation gate to realize the evolution of quantum bit, which is the same with QGA. (4) Migration of different quantum bits realizes the migration of population of different solution, so the convergence degree and the quality of the solution are improved.

The following method is utilized to generate real number candidate solution strings. There is a group of quantum bit strings, , which is the th quantum bit string in the th generation. Correspondingly, there is the other group of strings, ; each string contains real number. There are quantum bits in each representing the probability amplitude of . The probability to generate the real number which is larger (smaller) than present number is determined by (). All the probabilities are equal at the beginning of search; and are initialized to 0.707. Every element of    is initialized to a random number in the allowable range. Each pair of and constitutes the th family in the th generation.

solution strings of the th family are generated by , , and (the best solution found present). Fitness is calculated under the constraint condition. The process to generate is shown in Figure 2.

3.2. Evolutionary Method of RQGA

Two kinds of neighbor operators are adopted to generate new strings. neighbor solution strings are generated by Neighbor Operator 1 (NO1) and Neighbor Operator 2 (NO2) where the th family is determined by the best neighbor solution . If is better than , replaces to be . If the optimal value of is better than , is replaced by the optimal value of to be .

The basic principle of the two operators is as follows: NO1 has better search performance to generate solution strings which are very different from the given string. NO2 has better exploitation performance to make converge to in the process of algorithm. So, this algorithm can keep a balance between exploration and exploitation. The evolution of quantum bit represents the evolution of the superposition state, and change of and is translated to the change of real number generated by the two neighbor operators. The role of the two neighbor operators is that NO1 is used to search the solution space and NO2 is used to converge to extreme value.

Neighbor Operator 1: in the th generation, there are quantum bit strings ; there are elements in each quantum bit. NO1 generates solution string ; there are elements in each string.

Generate an array with elements; the value of elements of is +1 or −1 randomly. is the th element of ; then where is the variety of angle, is the rotation angle, and . If , is a random number in the range ; if , is a random number in the range .

The new formula of probability amplitude is .

The formula of individual elements is where and are the maximum and minimum values in the allowable range. The flowchart of using NO1 to calculate the th element of the th individual of the th population in the th generation is shown in Figure 3.

Neighbor Operator 2: most of the mechanism of NO2 is the same with NO1. In addition, the value point generated by NO2 is between and , and the generated point is used for exploring the search space. Formula (4) is applied to calculate in NO2 where , , and is the best individual of the th family in the th generation.

NO1 is better in exploration; NO2 is better in exploitation. Exploration is important in the early evolution; exploitation is important in the late evolution. So NO1 is carried out with a greater frequency in the early evolution, and NO2 is carried out with a greater frequency in the late evolution. The specific frequency is defined by formula (9). Consider where is the use frequency of NO1, is the use frequency of NO2, is the current evolutionary generation, and is the total evolutionary generation of every circle.

3.3. Update of Quantum Bit String

The individual states of all quantum bits in are changing during the process of update so that the probability of the generated solution similar to the current optimal solution increases gradually. The change of probability is determined by the learning rate . The value of under different conditions is shown in Table 1.

determines the speed of the quantum bit changing from 0.707 to the final value of 0 or 1. needs to be small enough to ensure that the number of generation that   changes from 0.707 to 0 (or 1) is large enough. So, the probability to generate the solution similar to the current best solution is larger when most of the quantum bits converge to 0 or 1.

Two kinds of migration (global and local) take effect together. is selected randomly to update some in local migration; is used to update in the global migration. It also needs to consider the particularity of the application object when RQGA is applied to specific problem. The flowchart of RQGA is shown in Figure 4.

RQGA is with good global search ability. RQGA is usually not restricted by the constraint conditions such as the property of the problem and the structure of the model of the problem and can converge to global optimal solution with a larger probability. Robustness of RQGA makes it capable of being combined with BP algorithm to improve the generalization ability and learning ability of neural network. And the encoding manner based on real number avoids encoding and decoding in binary encoding manner to improve the computational efficiency.

4. The BPNN Based on RQGA

The convergence speed of BPNN is slow, so the RQGA is introduced to optimize the parameters of network, speed up the convergence, and obtain the global optimal solution. RQGA is a global search process, which searches from one population to another. The parameter space is sampled unceasingly and the direction of the search is toward the area of the current optimal solution. The BPNN based on RQGA (RQGA-BP) combines advantages of RQGA and BPNN. RQGA is applied to optimize the weights and thresholds of input layer and hidden layer to avoid BP algorithm trapped in local minima.

The process of RQGA optimizing BPNN is divided into three parts: structure determination of BPNN, RQGA optimization, and BPNN optimization. Structure determination of BPNN needs to specify the number of parameters of input and output. RQGA optimization optimizes the weights and thresholds of BPNN. Each individual of the population contains all the weights and thresholds of network. Flowchart of RQGA-BP is shown in Figure 5.

The specific steps of RQGA-BP are as follows. (1) Code parameters. Set the weights and thresholds as genes. Each weight and threshold is expressed by a real number. So the evolution is based on some weights and thresholds. (2) Generate initial population. The range of gene is (−0.5, 0.5) because the weights are small in good network. (3) Calculate fitness. The goal of BPNN is to make the residual error between forecast value and expected value as small as possible, so the norm of error matrix between expected value and forecast value is set to be the output of the objective function. Adopt sort of fitness assignment function with the manner of linear ranking and differential pressure 2 to estimate the fitness. (4) Evolve the population with RQGA and calculate the fitness of new individual. (5) The gene of best individual is the optimal solution of the weights and thresholds and is used to predict neural network.

5. Case Analysis

Two cases are used to justify the performance of RQGA-BP.

5.1. Case One

Apply RQGA-BP to fault detection for parts of machinery. Due to invisibility of the fault of mechanical parts, the possible states can be concluded by measuring the related characteristic parameters. Select 15 characteristic parameters of parts and normalize the data. The training sample data and the test sample data are shown in Tables 2 and 3, respectively.

The formula is adopted to calculate the number of neurons in hidden layer. The transfer function of the neurons in hidden layer is S-shaped tangent function. The transfer function of the neurons in output layer is S-shaped logarithmic function. The states of parts are divided into three kinds of situation and the output form of the three situations is as follows:

Normal: ; Crack: ; Defect: .

The nodes number of input layer is 15, nodes number of output layer is 3, and nodes number of hidden layer is 31. So the number of weights is 558, the number of thresholds is 34, and the number of total parameters that needs to be optimized is 592. The training frequency of network is 1000, the training goal is 0.01, and the learning rate is 0.1. The norm of test error of test sample is set as the quota to measure the generalization capability of network. The fitness value of individual is calculated based on the norm of error. The smaller the error is, the larger the fitness of individual is.

The initial code of RQGA-BP is as follows: is a random number in the range and .

Apply BP, GA-BP, and RQGA-BP to process the data. The size of population is 40, and the maximal evolution generation is 50. Table 4 shows the results of the experiment by performing each algorithm 10 times, where Error 1 stands for the training error and Error 2 stands for the test error. We can get the conclusion that the test result of training and prediction data of RQGA-BP is better than GA-BP and the test result of training and prediction data of GA-BP is better than BP.

Apply the test method of mean difference of two normal populations (t-test) to analyze the error of BP, GA-BP, and RQGA-BP.

The data of error of the three methods can be considered to be samples from normal population . The means of the sample of the three methods are , , and , respectively, and the variances are , , and . The -statistic is introduced to be the test statistic: where , , and .

The form of rejection region is .

{Reject when is true; then .

So the rejection region is .

The hypotheses : and : need to be tested.

The mean and variance of Error 1 of BP, GA-BP, and RQGA-BP are as follows: , , ; , , ; , . Then, , .

Consider . So reject ; that is, Error 1 of BP is larger than the GA-BP with probability of more than 99.5%.

Consider . So reject ; that is, Error 1 of GA-BP is larger than the RQGA-BP with probability of more than 97.5%.

The mean and variance of Error 2 of BP, GA-BP, and RQGA-BP are as follows: , , ; , , ; , . Then, , .

Consider . So reject ; that is, Error 2 of BP is larger than the GA-BP with probability of more than 99.5%.

Consider . So reject ; that is, Error 2 of GA-BP is larger than the RQGA-BP with probability of more than 90%.

Select one evolution of GA-BP and RQGA-BP randomly; the change process of error is shown in Figure 6. The -axis represents the evolutionary generation; the -axis represents the error. The active line represents the evolutionary process of GA-BP. The dotted line represents the evolutionary process of RQGA-BP. It can be concluded from the figure that the final error of RQGA-BP is smaller, and the evolution of RQGA-BP is faster.

5.2. Case Two

Apply RQGA-BP to forecast the gasoline octane number. There are some disadvantages in traditional experimental method to measure the octane number, for example, the large dosage of sample, long experimental period, and high cost. Near infrared spectroscopy (NIR) is with the advantages of lower cost, no pollution, nondestructive testing, and online analysis. Use Fourier transform near infrared transform spectrometer to scan 60 groups of gasoline sample. The scanning range is 900–1700 nm, scanning interval is 2 nm, and each sample contains 401 wavelength points. Diagram of NIR of samples is as shown in Figure 7.

Utilize the laboratory testing method to test the octane number of sample. Apply GA-BPNN and RQGA-BPNN to set up the mathematical model between the near infrared spectrum and the octane number, respectively. Number of neurons of input layer is 401; number of neurons of output layer is 1. Since the number of neurons of input layer is large, adopt formula to calculate the number of neurons of hidden layer to 9. The number of training times is 1000, training target is 0.001, and learning rate is 0.01.

Apply randomly select method to generate training set and test set from 60 samples. The number of training sets is 50, and the number of test sets is 10. Since the training set and testing set are randomly generated each time, the results of each run may be different from each other.

After the test, evaluate the generalization ability of the network by calculating deviation between predicted value and true value. Choose relative error and the coefficient of decision to evaluate the generalization ability. The calculation formulas are shown in formula (8) and formula (9), respectively: where    is the prediction value of the th sample;    is the true value of the th sample; is the number of samples. The smaller the sum of relative error the better, and the larger the decision coefficient the better. Run GA-BPNN and RQGA-BPNN 10 times, respectively; the result is shown in Table 5.

Apply the -test method in Case One to compare the performance of and of GA-BP and RQGA-BP.

The mean and variance of of GA-BP and RQGA-BP are as follows: , , ; , , . Then, .

Consider . So reject ; that is, of GA-BP is larger than the RQGA-BP with probability of more than 97.5%.

The mean and variance of of GA-BP and RQGA-BP are as follows: , , ; , , . Then, .

Consider . So accept ; that is, of GA-BP is smaller than the RQGA-BP with probability of more than 99.5%.

As can be seen from the data in the table, the seventh result of GA-BP is better than RQGA-BP; the remaining results of RQGA-BP are better than GA-BP.

Set the fourth result as the example; the specific calculation results are as shown in Table 6.

The contrast of prediction results of two methods with true value of the fourth experiment is as shown in Figure 8.

It can be seen that both GA-BP and RQGA-BP can predict the octane content and the result of RQGA-BP is better.

6. Conclusions

The optimization of weights and thresholds of BPNN is a numerical optimization problem. The purpose of RQGA optimizing BPNN is to obtain better initial weights and thresholds through RQGA. The individual in RQGA represents the initial weights and thresholds of network, and the norm of test error of the prediction sample is the output of objective function. Compared with conventional BPNN, the RQGA-BP is with higher convergence rate.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.