Abstract

In many applications, it is natural to use interval data to describe various kinds of uncertainties. This paper is concerned with an interval neural network with a hidden layer. For the original interval neural network, it might cause oscillation in the learning procedure as indicated in our numerical experiments. In this paper, a smoothing interval neural network is proposed to prevent the weights oscillation during the learning procedure. Here, by smoothing we mean that, in a neighborhood of the origin, we replace the absolute values of the weights by a smooth function of the weights in the hidden layer and output layer. The convergence of a gradient algorithm for training the smoothing interval neural network is proved. Supporting numerical experiments are provided.

1. Introduction

In the last two decades artificial neural networks have been successfully applied to various domains, including pattern recognition [1], forecasting [2, 3], and data mining [4, 5]. One of the most widely used neural networks is the feedforward neural network with the well-known error backpropagation learning algorithm. But in most neural network architectures, input variables and the predicted results are represented in the form of single point value, not in the form of intervals. However, in real-life situations, available information is often uncertain, imprecise, and incomplete, which can be represented by fuzzy data, a generalization of interval data. So in many applications it is more natural to treat the input variables and the predicted results in the form of intervals than a set of single-point value.

Since multilayer feedforward neural networks have high capability as a universal approximator of nonlinear mappings [68], some methods via neural networks for handling interval data have been proposed. For instance, in [9], the BP algorithm [10, 11] was extended to the case of interval input vectors. In [12], the author proposed a new extension of backpropagation by using interval arithmetic which called Interval Arithmetic Back-propagation (IABP). This new algorithm permits the use of training samples and targets which can be indistinctly points and intervals. In [13], the author proposed a new model of multilayer perceptron based on interval arithmetic that facilitates handling input and output interval data, where weights and biases are single valued and not interval valued.

However, weights oscillation phenomena during the learning procedure were observed in our numerical experiments for these interval neural networks models. In order to prevent the weights oscillation, a smoothing interval neuron is proposed in this paper. Here, by smoothing we mean that, in the activation function and in a neighborhood of the origin, we replace the absolute values of the weights by a smooth function of the weights. Gradient algorithms [1417] are applied to train the smoothing interval neural network. The weak and strong convergence theorems of the algorithms are proved. Supporting numerical results are provided.

The remainder of this paper is organized as follows. Some basic notations of interval analysis are described in Section 2. The traditional interval neural network is introduced in Section 3. Section 4 is devoted to our smoothing interval neural network and the gradient algorithm. The convergence results of the gradient learning algorithm are shown in Section 5. Supporting numerical experiments are provided in Section 6. The appendix is devoted to the proof of the theorem.

2. Interval Arithmetic

Interval arithmetic as a tool appeared in numerical computing in late 1950s. Then the interval mathematic is a theory introduced by Moore [18] and Sunaga [19] in order to give control of errors in numeric computations. Fundamentals used in this paper are described below.

Let us denote the intervals by uppercase letters such as and the real numbers by lowercase letters such as . An interval can be represented by its lower bounds and upper bounds as , or equivalently by its midpoint and radius as , where For intervals and , the basic interval operations are defined by where is a constant.

If is an increasing function, then the interval output is given by In this paper, we use the following weighted Euclidean distance for a pair of intervals and The parameter facilitates giving more importance to the prediction of the output centres or to the prediction of the radii. For learning concentrates on the prediction of the output interval centre and no importance is given to the prediction of its radius. For both predictions (centres and radii) have the same weights in the objective function. For our purpose, we assume .

3. Interval Neural Network

In this paper, we consider an interval neural network with three layers, where the input and output are interval value, the weights are real value. The numbers of neurons for the input, hidden and output layers are , respectively. Let , be the weight matrix connecting the input and the hidden layers. The weight vector connecting the hidden and the output layers is denoted by . To simplify the presentation, we write . In the interval neural network, a nonlinear activation function is used in the hidden layer, and a linear activation function in the output layer.

For an arbitrary interval-valued input , where , , as the weights of the proposed structure are real value, this linear combination results in a interval given by Then the output of the interval neuron in the hidden layer is given by Finally, the output of the interval neuron in the output layer is given by

4. Smoothing Interval Neural Network

4.1. Smoothing Interval Neural Network Structure

As revealed in the numerical experiment below in this paper, there appear weights oscillation phenomena during the learning procedure for the original interval neural network presented in the last section. In order to prevent the weights oscillation, we propose a smoothing interval neural network by replacing and with a smooth function and in (3.1) and (3.5). Then, the output of the smoothing interval neuron in the hidden layer is defined as The output of the smoothing interval neuron in the output layer is given by For our purpose, can be chosen as any smooth function that approximates near the origin. For definiteness and simplicity, we choose as a polynomial function: where is a small constant and We observe that the above defined is a convex function in , and it is identical to the absolute value function outside the zero neighborhood .

4.2. Gradient Algorithm of the Smoothing Interval Neural Network

Suppose that we are supplied with a training sample set , where ’s and ’s are input and ideal output samples, respectively, as follows: , , , . Our task is to find the weights such that But usually, the weight satisfying (4.6) does not exit and, instead, the aim of the network learning is to choose the weight to minimize an error function of the smoothing interval neural network. By (2.4), a simple and typical error function is the quadratic error function: Let us denote , , , , then the error function (4.7) is rewritten as Now, we introduce the gradient algorithm [15, 16] for the smoothing interval neural network. The gradient of the error function with respect to is given by where The gradient of the error function with respect to is given by where In the learning procedure, the weights are iteratively refined as follows: where where a constant learning rate and

5. Convergence Theorem for SINN

For any , its Euclidean norm is . Let be the stationary point set of the error function , where is a bounded region satisfying () below. Let be the projection of onto the th coordinate axis, that is, for . To analyze the convergence of the algorithm, we need the following assumptions. , , are uniformly bounded for . There exists a bounded region such that . The learning rate is small enough such that below is valid. does not contain any interior point for every .

Now we are ready to present one convergence theorem of the learning algorithms. Its proof is given in the appendix later on.

Theorem 5.1. Let the error function be defined by (4.7), and the weight sequence be generated by the learning procedure (4.13) and (4.14) for smoothing interval neuron with being an arbitrary initial guess. If Assumptions , , and are valid, then we have Furthermore, if Assumption also holds, there exists a point such that

6. Numerical Experiment

We compare the performances of the interval neural network and the smoothing interval neural network by approximating a simple interval function In this example, the training set contains five training samples. Their midpoints are all 0 and their radii are , respectively. The corresponding outputs of the samples are .

For the above two interval neural networks, the error function is defined as in (4.7). But in order to see the error more clearly in the figures, we will also use the error defined by

The number of training iterations is 2000, the initial midpoint of weight vector is selected randomly from , and two neurons are selected in the hidden layer. The fix learning rate is , , and .

In the learning procedure for the interval neural network, we clearly see from Figure 1(a) that the gradient norm is not convergent. Figure 2(a) shows that the error function is oscillating and not convergent. On the contrary, we see from Figure 1(b) that the gradient norm of the smoothing interval neural network is convergent. Figure 2(b) shows that the error function , as well as , is monotone decreasing and convergent.

From this numerical experiment, we can see that the proposed smoothing neural network can efficiently avoid the oscillation during the training process.

Appendix

First, we give Lemmas A.1 and A.2. Then, we use them to prove Theorem 5.1.

Lemma A.1. Let be a bounded sequence satisfying . Write , , and There exists a subsequence of such that as . Then we have

Proof. It is obvious that and . If , then (A.1) follows simply from . Let us consider the case and proceed to prove that .
For any , there exists such that . Noting , we observe that travels between and with very small pace for all large enough . Hence, there must be infinite number of points of the sequence falling into . This implies and thus . Furthermore, immediately leads to . This completes the proof.
For any ,  , we define the following notations.

Lemma A.2. Suppose Assumption holds, for any and , then we have where is independent of and , lies on the segment between and , lies on the segment between and , both lie on the segment between and .

Proof. The proof of (A.3) in Lemma A.2: For the given training sample set, by Assumption , (4.2), and (4.4), it is easy to known that (A.3) is valid.
The proof of (A.4) in Lemma A.2: by (4.9) and (4.14), we have This proves (A.4).
The proof of (A.5) in Lemma A.2: using the Mean Value Theorem, for any , , and , we have where is on the segment between and , is on the segment between and . By (A.3), we have where is on the segment between and . Since if and , , .
If , we have so if , we have According to (A.16) and (A.13), we can obtain that By (A.17), for any and , we have According to the definition of , we get that , combining with (A.3), we deduce that . By (A.18), we have where . This proves (A.5).
The proof of (A.6) in Lemma A.2: using the Taylor expansion, we get that where is on the segment between and , is on the segment between and . By (A.3), (A.16), we deduce that where both lie on the segment between and . Similarly, we can deduce that where both lie on the segment between and . Combining with (A.20), we have By (A.23), we get that where This together with (A.25) leads to This together with (A.26) leads to By (A.3), (A.16) and , we have Similarly, we can obtain that So by (A.28), (A.29), and (A.30), we have with (A.23), similarly, we get that where lie on the segment between and , lies on the segment between and , lies on the segment between and . By (A.32), we have where By (A.34), we have with (A.31), similarly, this together with (A.35) leads to By (A.27), (A.31), (A.36) and (A.37), we obtain that Combining with (4.11), (4.12), and (4.14), we get that where . This proves (A.6).
The proof of (A.7) in Lemma A.2: According to the definition of , we get that , combining with (A.3), (A.18), we have where . This proves (A.7).
The proof of (A.8) in Lemma A.2: With (A.17), similarly, for any and , we can get that According to the definition of , we get that , combining with (A.3), we can obtain that . By (A.16) and (A.41), we deduce that where . This proves (A.8).
The proof of (A.9) in lemma A.2: By , (A.3) and (A.16), we get that where . This proves (A.9).
The proof of (A.10) in Lemma A.2: According to the definition of , we get that , combining with (A.3) and (A.41), we have where . This proves (A.10). Thus this completes the proof of Lemma A.2.

Now we are ready to prove Theorem 5.1.

Proof. Using the Taylor expansion and Lemma A.2, for any , we have where , lies on the segment between and , lies on the segment between and , both lie on the segment between and . Let , then Obviously, we require the learning rate to satisfy Thus, we can obtain that This together with (A.46) leads to Since , we have Letting results in So this immediately gives According to (4.14) and (A.52), we get that
According to (), the sequence has a subsequence that is convergent to, say, . It follows from (5.3) and the continuity of that This implies that is a stationary point of . Hence, has at least one accumulation point and every accumulation point must be a stationary point.
Next, by reduction to absurdity, we prove that has precisely one accumulation point. Let us assume the contrary that has at least two accumulation points . We write . It is easy to see from (4.13) and (4.14) that , or equivalently, for . Without loss of generality, we assume that the first components of and do not equal to each other, that is, . For any real number , let . By Lemma A.1, there exists a subsequence of converging to as . Due to the boundedness of , there is a convergent subsequence . We define . Repeating this procedure, we end up with decreasing subsequences with for each . Write . Then, we see that is an accumulation point of for any . But this means that has interior points, which contradicts . Thus, must be a unique accumulation point of . This proves (5.4). Thus this completes the proof of Theorem 5.1.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (11171367) and the Fundamental Research Funds for the Central Universities of China.