Computational Intelligence and Neuroscience

Volume 2018, Article ID 7361628, 10 pages

https://doi.org/10.1155/2018/7361628

## Fractional-Order Deep Backpropagation Neural Network

College of Computer Science, Sichuan University, Chengdu 610065, China

Correspondence should be addressed to Yi Zhang; moc.kooltuo@ucs.gnahziy

Received 13 March 2018; Accepted 6 June 2018; Published 3 July 2018

Academic Editor: Friedhelm Schwenker

Copyright © 2018 Chunhui Bao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In recent years, the research of artificial neural networks based on fractional calculus has attracted much attention. In this paper, we proposed a fractional-order deep backpropagation (BP) neural network model with regularization. The proposed network was optimized by the fractional gradient descent method with Caputo derivative. We also illustrated the necessary conditions for the convergence of the proposed network. The influence of regularization on the convergence was analyzed with the fractional-order variational method. The experiments have been performed on the MNIST dataset to demonstrate that the proposed network was deterministically convergent and can effectively avoid overfitting.

#### 1. Introduction

It is well known that artificial neural networks (ANNs) are the abstraction, simplification, and simulation of the human brains and reflect the basic characteristics of the human brains [1]. In recent years, great progress has been made in the research of deep neural networks. Due to the powerful ability of complex nonlinear mapping, many practical problems have been successfully solved by ANNs in the fields of pattern recognition, intelligent robot, automatic control, prediction, biology, medicine, economics, and other fields [2, 3]. BP neural network is one of the most basic and typical multilayer forward neural networks, which are trained by backpropagation (BP) algorithm. BP, which is an efficient way for optimization of ANNs, was firstly introduced by Werbos in 1974. Then, Rumelhart and McCelland et al. implemented the BP algorithm in detail in 1987 and applied it to the multilayer network version of Minsky [4–6].

The fractional calculus has a history as long as the integral order calculus. In the past three hundred years, the theory of fractional calculus has made great progresses [7–11]. Its basics are differentiation and integration of arbitrary fractional order. Nowadays, fractional calculus is widely used in diffusion processes [12–14], viscoelasticity theory [15], automation control [16–18], signal processing [19–21], image processing [22–25], medical imaging [26–28], neural networks [29–37], and many other fields. Due to the long-term memory, nonlocality, and weak singularity characteristics [29–37], fractional calculus has been successfully applied to ANNs. For instance, Boroomand constructed the Hopfield neural networks based on fractional calculus [37]. Kaslik analyzed the stability of Hopfield neural networks [30]. Pu proposed a fractional steepest descent approach and offered a detailed analysis of its learning conditions, stability, and convergence [38]. Wang applied the fractional steepest descent algorithm to train BP neural networks and proved the monotonicity and convergence of a three-layer example [33]. However, there are three limitations in the proposed fractional-order BP neural network models in [33]. First, the neural network in [33] just had 3 layers, which was actually a shadow network and was not proper to demonstrate its potential for deep learning. Second, the fractional order of this model was restricted to without reasonable analysis. Third, the loss function did not contain the regularization term, which is an efficient way to avoid overfitting, especially when the training set has small scalar. Overfitting means that the model has high prediction accuracy on training set but has the low prediction accuracy on testing set. This makes the generalization ability of the model poor, and the application value is greatly reduced.

In this paper, we proposed a deep fractional-order BP neural network with regularization term, and the fractional-order could be any positive real number. With the fractional-order variational method, the influence of regularization on the convergence of the proposed model was exploited. The performance of the proposed model was evaluated on the MINST dataset.

The structure of the paper is as follows: in Section 2, the definitions and simple properties of fractional calculus are introduced. In Section 3, the proposed fractional-order multilayer BP neural networks are given in detail. In Section 4, the necessary conditions and the influence of regularization for the convergence of the proposed BP algorithm are stated. In Section 5, experimental results are presented to illustrate the effectiveness of our model. Finally, the paper is concluded in Section 6.

#### 2. Background Theory for Fractional Calculus

In this section, the basic knowledge of fractional calculus is introduced, including the definitions and several simple properties used in this paper.

Different from integer calculus, fractional derivative does not have a unified temporal definition expression up to now. The commonly used definitions of fractional derivative are Grünwald-Letnikov (G-L), Riemann-Liouville (R-L), and Caputo derivatives [7–11].

The following is the G-L definition of fractional derivative:where denotes the fractional differential operator based on G-L definition, denotes a differintegrable function, is the fractional order, is the domain of , is the Gamma function, and [] is the rounding function.

The R-L definition of fractional derivative is as follows: where denotes the fractional differential operator based on G-L definition; . Moreover, the G-L fractional derivative can be deduced from the definition of the R-L fractional derivative.

The Caputo definition of fractional derivative is as follows:where is the fractional differential operator based on Caputo definition, .

Fractional calculus is more difficult to compute than integer calculus. Several mathematical properties used in this paper are given here. The fractional differential of a linear combination of differintegral functions is as follows:where and are differintegral functions and and are constants.

The fractional differential of constant function , ( is a constant) is different under different definitions:

For the G-L definition,

For the R-L definition,

And for the Caputo definition

According to (6), (7) and (8), we can know that for the G-L and R-L definition, the fractional differential of constant function is not equal to 0. Only with the Caputo definition, the fractional differential of constant function equals to 0, which is consistent to the integer-order calculus. Therefore, the Caputo definition is widely used in solving engineering problems and it was employed to calculate the fractional-order derivative in this paper. The fractional differential of function , is as follows:

#### 3. Algorithm Description

##### 3.1. Fractional-Order Deep BP Neural Networks

In this section, we introduce the fractional-order deep BP neural network with layers. , , is the number of neurons for the -th layer. denotes the weight matrix connecting the -th layer and the -th layer. denotes the corresponding activation function for the -th layer. and are the input and the corresponding ideal output of the -th sample and the training sample set is . denotes the total inputs of -th layer. If neurons in the -th layer are not connected to any neurons in previous layer, these neurons are called external outputs of the -th layer, denoted as . On the contrary, if neurons in the -th layer are connected to every neuron in previous layer, these neurons are called internal outputs of -th layer, denoted as . denotes the total outputs of -th layer. The forward computing of the fractional-order deep BP neural networks is as follows:

Particularly, external outputs can exist in any layer except the last one. With the square error function, the error corresponding to -th sample can be denoted as:where denotes the -th element of , denotes the -th element of .

The total error of the neural networks is defined as

In order to minimize the total error of the fractional-order deep BP neural network, the weights are updated by the fractional gradient descent method with Caputo derivative. Let . The backpropagation of fractional-order deep BP neural networks can be derived with the following steps.

Firstly, we define that

According to (13), we can know that

Then the relationship between and can be given by

Then, according to the chain rule and (17), we have

The updating formula iswhere denotes the -th iteration and is the learning rate.

##### 3.2. Fractional Deep BP Neural Networks with Regularization

Fractional-order BP neural network can be overfitted easily when the training set has small scalar. regularization is a useful way to avoid models to be overfitted without modifying the architecture of network. Therefore, by introducing the regularization term into the total error, the modified error function can be presented aswhere denotes the sum of squares of all weights and denotes the regularization parameter.

By introducing (18), we have

The updating formula iswhere denotes the -th iteration and is the learning rate.

#### 4. Convergence Analysis

In this section, the convergence of the proposed fractional-order BP neural network is analyzed. According to previous studies [39–42], there are four necessary conditions for the convergence of BP neural networks:

(1) The activation functions are bounded and infinitely differentiable on R and all of their corresponding derivatives are also continuous and bounded on . This condition can be easily satisfied because the most common sigmoid activation functions are uniformly bounded on and infinitely differentiable.

(2) The boundedness of the weight sequence is valid during training procedure and is the domain of all weights with certain boundary.

(3) The learning rate has an upper bound.

(4) Let denote the weights matrix that consists of all weights and be the -order stationary point set of the error function. One necessary condition is that is a finite set.

Then, the influence of regularization on the convergence is derived by using the fractional-order variational method.

According to (20), is defined as a fractional-order multivariable function. The proposed fractional-order BP algorithm is to minimize . Let denote the fractional-order extreme point of and denotes an admissible point. In addition, is composed of where denotes the weights matrix between the -th and -th layer when reaches the extreme value. is composed of where corresponds to . The initial weights are random values, so the initial points of weights can be represented as , where is a vector that consists of small parameters , and corresponds to and . If , it means , then , and reaches the extreme value. Thus, the process of training the BP neural networks from a random initial weight to can be treated as the process of training with a random initial value to .

The fractional-order derivative of on is given aswhere is the fractional order, which is a positive real number.

From (23), we can see that when , if the -order differential of with respect to is existent, has a -order extreme point and we have

In this case, the output of each layer in the neural networks is still given by (10) and (11) and the input of each layer is turned into the following:

When , we have

Without loss of generality, according to (18), for the -th layer of the networks, the -order differential of with respect to can be calculated aswhere denotes the column vector .

Since the value of is stochastic, according to variation principle [43], to allow (24) to be set up, a necessary condition is that for every layer of the networks

Secondly, without loss of generality, for we have

To allow (29) to be set up, a necessary condition is

With (28) and (30), the Euler-Lagrange equation of can be written as

Equation (31) is the necessary condition for the convergence of the proposed fractional-order BP neural networks with regularization. From (31), we can see that if , then . is the first-order derivative of in terms of and can be calculated by and input sample . It means that the extreme point of the proposed algorithm is not equal to the extreme point of integer-order BP algorithm or fractional-order BP algorithm. changes with the different value of and . In addition, it is also clear that the regularization parameter is bounded since the values of input samples and weights are bounded and is a constant during the training process.

#### 5. Experiments

In this section, the following simulations were carried out to evaluate the performance of the presented algorithm. The simulations have been performed on the MNIST handwritten digital dataset. Each digit in the dataset is a 28 × 28 image. Each image is associated with a label from 0 to 9. We divided each image into four parts, which were top-left, bottom-left, bottom-right, and top-right, and each part was a 14 × 14 matrix. We vectorized each part of the image as a 196 × 1 vector and each label as a 10 × 1 vector.

In order to identify the handwritten digits in MNIST dataset, a neural network with 8 layers was proposed. Figure 1 shows the topological structure of the neural networks. For the first four layers of the network, each layer has 196 external neurons and 32 internal neurons. The outputs of the external neurons are in turn four parts of an image and the outputs of the internal neurons of the first layer are 1. The last four layers have no external neurons. The fifth layer, sixth layer, and seventh layer have 64 internal nodes and the output layer has ten nodes. The activation functions of all neurons except the first layer are sigmoid functions, which can be given as follows: