Convergence of an Online Split-Complex Gradient Algorithm for Complex-Valued Neural Networks

Zhang, Huisheng; Xu, Dongpo; Wang, Zhiping

doi:https://doi.org/10.1155/2010/829692

Discrete Dynamics in Nature and Society

On this page

Abstract Introduction Proofs Numerical Example Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2010 | Article ID 829692 | https://doi.org/10.1155/2010/829692

Convergence of an Online Split-Complex Gradient Algorithm for Complex-Valued Neural Networks

Huisheng Zhang,¹Dongpo Xu,²and Zhiping Wang¹

Academic Editor: Manuel De La Sen

Received01 Sept 2009

Accepted19 Jan 2010

Published21 Mar 2010

Abstract

The online gradient method has been widely used in training neural networks. We consider in this paper an online split-complex gradient algorithm for complex-valued neural networks. We choose an adaptive learning rate during the training procedure. Under certain conditions, by firstly showing the monotonicity of the error function, it is proved that the gradient of the error function tends to zero and the weight sequence tends to a fixed point. A numerical example is given to support the theoretical findings.

1. Introduction

In recent years, neural networks have been widely used because of their outstanding capability of approximating nonlinear models. As an important search method in optimization theory, gradient algorithm has been applied in various engineering fields, such as adaptive control and recursive parametrical estimation [1–3]. Gradient algorithm is also a popular training method for neural networks (when used to train neural networks with hidden layers, gradient algorithm is also called BP algorithm) and can be done either in the online or in the batch mode [4]. In online training, weights are updated after the presentation of each training example, while in batch training, weights are not updated until all of the examples are inputted into the networks. As a result, batch gradient training algorithm is always used when the number of training samples is relatively small. However, in the case that a very large number of training samples are available, online gradient training algorithm is preferred.

Conventional neural networks' parameters are usually real numbers for dealing with real-valued signals [5, 6]. In many applications, however, the inputs and outputs of a system are best described as complex-valued signals and processing is done in complex space. In order to solve the problem in complex domain, complex-valued neural networks (CVNNs) have been proposed in recent years [7–9], which are the extensions of the usual real-valued neural networks to complex numbers. Accordingly, there are two types of generalized gradient training algorithm for complex-valued neural networks: fully complex gradient algorithm [10–12] and split-complex gradient algorithm [13, 14]; both of which can be processed in online mode and batch mode. It has been pointed out that the split-complex gradient algorithm can avoid the problems resulting from the singular points [14].

Convergence is of primary importance for a training algorithm to be successfully used. There have been extensive research results concerning the convergence of gradient algorithm for real-valued neural networks (see, e.g., [15, 16] and the references cited therein), covering both of online mode and batch mode. In comparison, the convergence properties for complex gradient algorithm are seldom investigated. We refer the reader to [11, 12] for some convergence results of fully complex gradient algorithms and [17] for those of batch split-gradient algorithm. However, to the best of our knowledge, convergence analysis of online split-complex gradient (OSCG) algorithm for complex-valued neural networks has not yet been established in the literature, and this becomes our primary concern in this paper. Under certain conditions, by firstly showing the monotonicity of the error function, we prove that the gradient of the error function tends to zero and the weight sequence tends to a fixed point. A numerical example is also given to support the theoretical findings.

The remainder of this paper is organized as follows. The CVNN model and the OSCG algorithm are described in the next section. Section presents the main results. The proofs of these results are postponed to Section 4. In Section 5 we give a numerical example to support our theoretical findings. The paper ends with some conclusions given in Section 6.

2. Network Structure and Learning Method

It has been shown that two-layered CVNN can solve many problems that cannot be solved by real-valued neural networks with less than three layers [13]. Thus, without loss of generalization, this paper considers a two-layered CVNN consisting of input neurons and output neuron. For any positive integer , the set of all -dimensional complex vectors is denoted by and the set of all -dimensional real vectors is denoted by . Let us write as the weight vector between the input neurons and output neuron, where , and , , and . For input signals , where , and , the input of the output neuron is

Here “” denotes the inner product of two vectors.

For the convenience of using OSCG algorithm to train the network, we consider the following popular real-imaginary-type activation function [13]:

for any , where is a real function (e.g., sigmoid function). If simply denoting as , the network output is given by

Let the network be supplied with a given set of training examples . For each input from the training set, we write as the input for the output neuron and as the actual output. The square error function can be represented as follows:

where “*" signifies complex conjugate, and

The neural network training problem is to look for the optimal choice of the weights so as to minimize approximation error. The gradient method is often used to solve the minimization problem. Differentiating with respect to the real parts and imaginary parts of the weight vectors, respectively, gives

Now we describe the OSCG algorithm. Given initial weights at time 0, OSCG algorithm updates the weight vector by dealing with the real part and separately:

For , and denote that

Then (2.8) can be rewritten as

Given and a positive constant , we choose learning rate as

Equation (2.11) can be rewritten as

and this implies that

This type of learning rate is often used in the neural network training [16].

For the convergence analysis of OSCG algorithm, similar to the batch version of split-complex gradient algorithm [17], we shall need the following assumptions.

() There exists a constant such that

() The set contains only finite points.

3. Main Results

In this section, we will give several lemmas and the main convergence theorems. The proofs of those results are postponed to the next section.

In order to derive the convergence theorem, we need to estimate the values of the error function (2.4) at two successive cycles of the training iteration. Denote that

where , and . The first lemma breaks the changes of error function (2.4) at two successive cycles of the training iteration into several terms.

Lemma 3.1. Suppose Assumption is valid. Then one has where , , each lies on the segment between and , and each lies on the segment between and .

The second lemma gives the estimations on some terms of (3.2).

Lemma 3.2. Suppose Assumptions and hold, for , then one has where are constants and

From Lemmas 3.1 and 3.2, we can derive the following lemma.

Lemma 3.3. Suppose Assumptions and hold, for , then one has where is a constant.

With the above Lemmas 3.1–3.3, we can prove the following monotonicity result of OSCG algorithm.

Theorem 3.4. Let be given by (2.11) and let the weight sequence be generated by (2.8). Then under Assumption , there are positive numbers and such that for any and one has

To give the convergence theorem, we also need the following estimation.

Lemma 3.5. Let be given by (2.11). Then under Assumption , there are the same positive numbers and chosen as Theorem 3.4 such that for any and one has

The following lemma gives an estimate of a series, which is essential for the proof of the convergence theorem.

Lemma 3.6 (see [16]). Suppose that a series is convergent and . If there exists a constant such that then

The following lemma will be used to prove the convergence of the weight sequence.

Lemma 3.7. Suppose that the function is continuous and differentiable on a compact set and that contains only finite points. If a sequence satisfies then there exists a point such that

Now we are ready to give the main convergence theorem.

Theorem 3.8. Let be given by (2.11) and let the weight sequence be generated by (2.8). Then under Assumption , there are positive numbers and such that for any and one has Furthermore, if Assumption also holds, then there exists a point such that

4. Proofs

Proof of Lemma 3.1. Using Taylor's formula, we have where lies on the segment between and . Similarly we also have a point between and such that From (2.8) and (2.10) we have Combining (2.4), (2.9), (3.1), and (4.1)–(4.3), then we have where

Proof of Lemma 3.2. From (2.5) and Assumption we know that functions , , , , , and are all bounded. Thus there is a constant such that By (2.9), (2.10), (3.1), and the Mean-Value Theorem, for and we have where . Similarly we have In particular, as , for , we can get where . For , , suppose that where are nonnegative constants. Recalling , then we have where and . Similarly, we also have Thus, by setting , we have (3.3). Now we begin to prove (3.4). Using (3.3) and Cauchy-Schwartz inequality, we have where . This validates (3.4). Finally, we show (3.5). Using (2.10), (3.1), (3.3), and (4.3), we have where . Similarly we also have This together with (2.9) and (4.6) leads to where and . This completes the proof.

Proof of Lemma 3.3. Recalling Lemmas 3.1 and 3.2, we conclude that Then (3.6) is obtained by letting .

Proof of Theorem 3.4. In virtue of (3.6), the core to prove this lemma is to verify that In the following we will prove (4.18) by induction. First we take such that For suppose that Next we will prove that Notice that where lies on the segment between and , and lies on the segment between and . Similar to (4.14), we also have the following estimation: where . By (4.6) and (4.22)-(4.23) we know that there are positive constants and such that where . Taking squares of the two sides of the above inequality gives Now we sum up the above inequality over and obtain Let then On the other hand, from (4.22) we have Similar to the deduction of (4.24), from (4.29) we have It can be easily verified that, for any positive numbers , , , if , Applying (4.31) to (4.30) implies that Similarly, we can obtain the counterpart of (4.28) as and the counterpart of (4.32) as From (4.28) and (4.33) we have From (4.32) and (4.34) we have Using (2.11) and (4.36), we can get Multiplying (4.37) with gives Using (4.20) and (4.35), we obtain Combining (4.38) and (4.39) we have Thus to validate (4.21) we only need to prove the following inequality: Recalling (4.20) and , it is easy to see that each term of (4.41) can be assured for and by setting This, thus, validates (4.41). As a result, (4.18) and (3.7) are proved.

Proof of Lemma 3.5. From Lemma 3.3 we have Sum the above inequalities up over , then Note that for . Setting , we have Using (2.9) and (4.6), we can find a constant such that This together with (2.13) leads to Thus from (4.45) and (4.47) it holds that Recalling from (2.12) gives

Proof of Lemma 3.6. This lemma is the same as Lemma of [16].

Proof of Lemma 3.7. This result is almost the same as Theorem in [18], and the details of the proof are omitted.

Proof of Theorem 3.8. Using (2.9), (4.6), (4.14), and (4.15), we can find a constant such that From (2.6) and (4.22) we have where and are defined in (4.22). Thus, from (4.6), (4.50), and Cauchy-Schwartz inequality, there exists a constant such that for any vector Using (2.6), (2.9), and Lemma 3.5, we have From (4.52), (4.53), and Lemma 3.6 it holds that Since is arbitrary in , we have Therefore, when , we complete the proof of , and we can similarly show that for . Thus, we have shown that In a similar way, we can also prove that Thus, (3.13) is obtained from (4.56) and (4.57).
Next we begin to prove (3.14). Using (2.10), we have
Similar to (4.46), we know that is bounded. Recalling (2.13) makes us conclude that which implies that Similarly, we have Write then the square error function can be looked as a real-valued function . Thus from (4.56), (4.57), (4.60), and (4.61) we have
Furthermore, from Assumption (A2) we know that the set contains only finite points. Thus, the sequence here satisfies all the conditions needed in Lemma 3.7. As a result, there is a which satisfies that . Since consists of the real and imaginary parts of , we know that there is a such that . We, thus, complete the proof of (3.14).

5. Numerical Example

In this section we illustrate the convergence behavior of the OSCG algorithm by using a simple numerical example. The well-known XOR problem is a benchmark in literature of neural networks. As in [13], the training samples of the encoded XOR problem for CVNN are presented as follows:

This example uses a network with two input nodes (including a bias node) and one output node. The transfer function is tansig in MATLAB, which is a commonly used sigmoid function. The parameter is set to be and is set to be . We carry out the test with the initial components of the weights stochastically chosen in []. Figure 1 shows that the gradients tend to zero and the square error decreases monotonically as the number of iteration increases and at last tends to a constant. This supports our theoretical analysis.

6. Conclusion

In this paper we investigate some convergence properties of an OSCG training algorithm for two-layered CVNN. We choose an adaptive learning rate in the algorithm. Under the condition that the activation function and its up to the second-order derivative are bounded, it is proved that the error function is monotonely decreasing during the training process. With this result, we further prove that the gradient of the error function tends to zero and the weight sequence tends to a fixed point. A numerical example is given to support our theoretical analysis. We mention that those results are interestingly similar to the convergence results of batch split-complex gradient training algorithm for CVNN given in [17]. Thus our results can also be a theoretical explanation for the relationship between the OSCG algorithm and the batch split-complex algorithm. The convergence results in this paper can be generalized to a more general case, that is, multilayer CVNN.

Acknowledgments

The authors wish to thank the associate editor and the anonymous reviewers for their helpful comments and valuable suggestions regarding this paper. This work is supported by the National Natural Science Foundation of China (70971014).

References

S. Alonso-Quesada and M. De La Sen, “Robust adaptive control with multiple estimation models for stabilization of a class of non-inversely stable time-varying plants,” Asian Journal of Control, vol. 6, no. 1, pp. 59–73, 2004.
View at: Google Scholar
P. Zhao and O. P. Malik, “Design of an adaptive PSS based on recurrent adaptive control theory,” IEEE Transactions on Energy Conversion, vol. 24, no. 4, pp. 884–892, 2009.
View at: Publisher Site | Google Scholar
J. Cortès, “Distributed Kriged Kalman filter for spatial estimation,” IEEE Transactions on Automatic Control, vol. 54, no. 12, pp. 2816–2827, 2009.
View at: Publisher Site | Google Scholar
D. R. Wilson and T. R. Martinez, “The general inefficiency of batch training for gradient descent learning,” Neural Networks, vol. 16, no. 10, pp. 1429–1451, 2003.
View at: Publisher Site | Google Scholar
J. Li, Y. Diao, M. Li, and X. Yin, “Stability analysis of discrete Hopfield neural networks with the nonnegative definite monotone increasing weight function matrix,” Discrete Dynamics in Nature and Society, vol. 2009, Article ID 673548, 10 pages, 2009.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Q. Zhu and J. Cao, “Stochastic stability of neural networks with both Markovian jump parameters and continuously distributed delays,” Discrete Dynamics in Nature and Society, vol. 2009, Article ID 490515, 20 pages, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
G. M. George and C. Koutsougeras, “Complex domain backpropagation,” IEEE Transactions on Circuits and Systems II, vol. 39, no. 5, pp. 330–334, 1992.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
T. Benvenuto and F. Piazza, “On the complex backpropagaton algorithm,” IEEE Transactions on Signal Processing, vol. 40, no. 4, pp. 967–969, 1992.
View at: Google Scholar
A. Hirose, Complex-Valued Neural Networks, Springer, New York, NY, USA, 2006.
T. Kim and T. Adali, “Fully complex backpropagation for constant envelop signal processing,” in Proceedings of the IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing X, pp. 231–240, Sydney, Australia, December 2000.
View at: Google Scholar
A. I. Hanna and D. P. Mandic, “A data-reusing nonlinear gradient descent algorithm for a class of complex-valued neural adaptive filters,” Neural Processing Letters, vol. 17, no. 1, pp. 85–91, 2003.
View at: Publisher Site | Google Scholar
S. L. Goh and D. P. Mandic, “Stochastic gradient-adaptive complex-valued nonlinear neural adaptive filters with a gradient-adaptive step size,” IEEE Transactions on Neural Networks, vol. 18, no. 5, pp. 1511–1516, 2007.
View at: Publisher Site | Google Scholar
T. Nitta, “Orthogonality of decision boundaries in complex-valued neural networks,” Neural Computation, vol. 16, no. 1, pp. 73–97, 2004.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
S. S. Yang, S. Siu, and C. L. Ho, “Analysis of the initial values in split-complex backpropagation algorithm,” IEEE Transactions on Neural Networks, vol. 19, no. 9, pp. 564–1573, 2008.
View at: Google Scholar
H. Zhang, W. Wu, F. Liu, and M. Yao, “Boundedness and convergence of online gadient method with penalty for feedforward neural networks,” IEEE Transactions on Neural Networks, vol. 20, no. 6, pp. 1050–1054, 2009.
View at: Publisher Site | Google Scholar
W. Wu, G. Feng, Z. Li, and Y. Xu, “Deterministic convergence of an online gradient method for BP neural networks,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 533–540, 2005.
View at: Publisher Site | Google Scholar
H. Zhang, C. Zhang, and W. Wu, “Convergence of batch split-complex backpropagation algorithm for complex-valued neural networks,” Discrete Dynamics in Nature and Society, vol. 2009, Article ID 329173, 16 pages, 2009.
View at: Publisher Site | Google Scholar
J. Ortega and W. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, USA, 1970.

Copyright

Copyright © 2010 Huisheng Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

649

Downloads

3244

Citations