Convergence of Batch Split-Complex Backpropagation Algorithm for Complex-Valued Neural Networks
The batch split-complex backpropagation (BSCBP) algorithm for training complex-valued neural networks is considered. For constant learning rate, it is proved that the error function of BSCBP algorithm is monotone during the training iteration process, and the gradient of the error function tends to zero. By adding a moderate condition, the weights sequence itself is also proved to be convergent. A numerical example is given to support the theoretical analysis.
Neural networks are widely used in the fields of control, signal processing, and time series analysis . Traditional neural networks' parameters are usually real numbers for dealing with real-valued signals. However, complex-valued signals also appear in practical applications. As a result, complex-valued neural network (CVNN), whose weights, threshold values, input and output signals are all complex numbers, is proposed [2, 3]. CVNN has been extensively used in processing complex-valued signals . By encoding real-valued signals into complex numbers, CVNN also has shown more powerful capability than real-valued neural networks in processing real-valued signals. For example, two-layered CVNN  can successfully solve the XOR problem which cannot be solved by two-layered real-valued neural networks. CVNN can be trained by two types of complex backpropagation (BP) algorithms: fully complex BP algorithm and split-complex BP algorithm. Different from the fully complex BP algorithm , the operation of activation function in the split-complex BP algorithm is split into real part and imaginary part [2–4, 7], and this makes the split-complex BP algorithm avoid the occurrence of singular points in the adaptive training process. Complex BP algorithms can be done using either a batch method or an online method. In online training, weights are updated after the presentation of each training example, while in batch training, weights are not updated until all of the examples are inputted into the networks. Compared with batch learning, online learning is hard to parallelize.
The convergence of neural networks learning algorithms is crucial for practical applications. The dynamical behaviors of many neural networks have been extensively analyzed [8, 9]. However, the existing convergence results of complex BP algorithm are mainly focusing on fully complex BP algorithm for two-layered CVNN (see, e.g., [10, 11]) and the convergence of split-complex BP algorithm is seldom investigated. Nitta  used CVNN as a complex adaptive pattern classifier and presented some heuristic convergence results. The purpose of this paper is to give some rigorous convergence results of batch split-complex BP (BSCBP) algorithm for three-layered CVNN. The monotonicity of the error function during the training iteration process is also guaranteed.
The remainder of this paper is organized as follows. The three-layered CVNN model and the BSCBP algorithm are described in the next section. Section 3 presents the main convergence theorem. A numerical example is given in Section 4 to verify our theoretical results. The details of the convergence proof are provided in Section 5. Some conclusions are drawn in Section 6.
2. Network Structure and Learning Method
Figure 1 shows the structure of the network we considered in this paper. It is a three-layered CVNN consisting of input neurons, hidden neurons, and output neuron. For any positive integer , the set of all -dimensional complex vectors is denoted by and the set of all -dimensional real vectors is denoted by . Let us write as the weight vector between the input neurons and th hidden neuron, where , and , , , and . Similarly, write as the weight vector between the hidden neurons and the output neuron, where , and , . For simplicity, all the weight vectors are incorporated into a total weight vectorFor input signals , where , and , the input of the th hidden neuron isHere “” denotes the inner product of two vectors.
For the sake of using BSCBP algorithm to train the network, we consider the following popular real-imaginary-type activation function : for any , where is a real function (e.g., sigmoid function). If we simply denote as , the output for the hidden neuron is given by Similarly, the input of the output neuron is and the output of the network is given by where , and is a real function.
We remark that, in practice, there should be thresholds involved in the above formulas for the output and hidden neurons. Here we have omitted the bias so as to simplify the presentation and deduction.
Let the network be supplied with a given set of training examples . For each input from the training set, we write as the input for the hidden neuron , as the output for the hidden neuron , as the input to the output neuron, and as the actual output. The square error function of CVNN trained by BSCBP algorithm can be represented as follows:where “*” signifies complex conjugate, andThe purpose of the network training is to find which can minimize . The gradient method is often used to solve the minimization problem. Writingand differentiating with respect to the real parts and imaginary parts of the weight vectors, respectively, give Starting from an arbitrary initial value at time , BSCBP algorithm updates the weight vector iteratively bywhere , withHere stands for the learning rate. Obviously, we can rewrite (2.14) and (2.15) by dealing with the real parts and the imaginary parts of the weights separatelywhere .
3. Main Results
Throughout the paper denotes the usual Euclidean norm. We need the following assumptions:
(A1) there exists a constant such that(A2)there exists a constant such that and for all (A3)the set contains only finite points.
Theorem 3.1. Suppose that Assumptions (A1) and (A2) are valid and that are the weight vector sequence generated by
(2.14)–(2.16) with arbitrary initial values .
where is a constant defined in (5.21) below, then one has
(i);(ii), , , and . Furthermore, if Assumption (A3) also holds, then there exists a point such that
The monotonicity of the error function during the learning process is shown in the statement (i). The statement (ii) indicates the convergence of the gradients for the error function with respect to the real parts and the imaginary parts of the weights. The statement (iii) points out that if the number of the stationary points is finite, the sequence will converge to a local minimum of the error function.
4. Numerical Example
In this section, we illustrate the convergence behavior of BSCBP algorithm by using a simple numerical example. The well-known XOR problem is a benchmark in literature of neural networks. As in , the training samples of the encoded XOR problem for CVNN are presented as follows:
This example uses a network with one input neuron, three hidden neurons, and one output neuron. The transfer function is in MATLAB, which is a commonly used sigmoid function. The learning rate is set to be . We carry out the test with the initial components of the weights stochastically chosen in . Figure 2 shows that the gradient tends to zero and the square error decreases monotonically as the number of iteration increases and at last tends to a constant. This supports our theoretical findings.
In this section, we first present two lemmas; then, we use them to prove the main theorem.
Lemma 5.1. Suppose that the function is continuous and differentiable on a compact set and that contains only finite points. If a sequence satisfiesthen there exists a point such that
Proof. This result is almost the same as [13, Theorem 14.1.5], and the detail of the proof is omitted.
For any and , write
Lemma 5.2. Suppose Assumptions (A1) and (A2) hold, then for any and , one has where are constants independent of and , each lies on the segment between and , and each lies on the segment between and .
Proof. The validation
of (5.3) can be easily got by (2.4)–(2.6) when the set of samples are fixed and
Assumptions (A1) and (A2) are satisfied. By (2.8), we haveThen (5.4)
follows directly from Assumption (A1) by defining .
It follows from (5.2), Assumption (A1), the Mean-Value Theorem and the Cauchy-Schwartz Inequality that for any and , where and each is on the segment between and for . Similarly we can get Thus, we have (5.5).
By (2.10), (2.11), (2.16), and (5.2), we have
Next, we prove (5.7). By (2.2), (2.4), (5.2), and Taylor's formula, for any , , and , we havewhere is an intermediate point on the line segment between the two points and , and between the two points and . Thus according to (2.12), (2.13), (2.16), (5.2), (5.14), and (5.15), we havewhere Using Assumptions (A1) and (A2), (5.4), and triangular inequality, we immediately getwhere . Now, (5.7) results from (5.16) and (5.18).
According to (5.2), (5.4), and (5.5), we havewhere and . So we obtain (5.8) and (5.9).
Now, we are ready to prove Theorem 3.1 in terms of the above two lemmas.
Proof of Theorem 3.1. (i) By (5.6)–(5.9) and the
Taylor's formula, we havewhere , is on the segment between and ,
and is on the segment between and .
Then we haveObviously, by choosing the
learning rate to satisfy thatthen we have
(ii) According to (2.16), we have
Combining with (5.21), we havewhere . Since , there holds thatLet , thenSo there holds thatwhich implies that (iii) Writethen can be looked as a function of , which is denoted as . That is to say Obviously, is a continuously differentiable real-valued function andLet then by (5.30) and (5.31), we haveThus we haveWe use (2.16), (5.30), and (5.31) to obtainThis leads to Furthermore, from Assumption (A3) we know that the set contains only finite points. Thus, the sequence here satisfies all the conditions needed in Lemma 5.1. As a result, there is a which satisfies that . Since consists of the real and imaginary parts of , we know that there is a such that . We thus complete the proof.
In this paper, some convergence results of BSCBP algorithm for CVNN are presented. An up-bound of the learning rate is given to guarantee both the monotonicity of the error function and the convergence of the gradients for the error function. It is also proved that the network weights vector tends to a local minimum if there are only finite stable points for the error function. A numerical example is given to support the theoretical findings. Our work can help the neural network researchers to choose the appropriate activation function and learning rate to guarantee the convergence of the algorithm when they use BSCBP algorithm to train CVNN. We mention that the convergence results can be extended to a more general case that the networks have several outputs and hidden layers.
The authors wish to thank the Associate Editor and the anonymous reviewers for their helpful and interesting comments. This work is supported by the National Science Foundation of China (10871220).
A. Hirose, Complex-Valued Neural Networks, Springer, New York, NY, USA, 2006.View at: Zentralblatt MATH