Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 678965, 12 pages

http://dx.doi.org/10.1155/2015/678965

## Fuzzy Wavelet Neural Network Using a Correntropy Criterion for Nonlinear System Identification

^{1}Postgraduate Program in Electrical and Computer Engineering (PPgEEC), Federal University of Rio Grande do Norte, 59078-970 Natal, RN, Brazil^{2}Department of Computer Engineering, Federal University of Rio Grande do Norte, 59078-970 Natal, RN, Brazil^{3}Department of Elecrtical Engineering, Federal University of Rio Grande do Norte, 59078-970 Natal, RN, Brazil

Received 17 September 2014; Accepted 28 November 2014

Academic Editor: Yudong Zhang

Copyright © 2015 Leandro L. S. Linhares et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Recent researches have demonstrated that the Fuzzy Wavelet Neural Networks (FWNNs) are an efficient tool to identify nonlinear systems. In these structures, features related to fuzzy logic, wavelet functions, and neural networks are combined in an architecture similar to the Adaptive Neurofuzzy Inference Systems (ANFIS). In practical applications, the experimental data set used in the identification task often contains unknown noise and outliers, which decrease the FWNN model reliability. In order to reduce the negative effects of these erroneous measurements, this work proposes the direct use of a similarity measure based on information theory in the FWNN learning procedure. The Mean Squared Error (MSE) cost function is replaced by the Maximum Correntropy Criterion (MCC) in the traditional error backpropagation (BP) algorithm. The input-output maps of a real nonlinear system studied in this work are identified from an experimental data set corrupted by different outliers rates and additive white Gaussian noise. The results demonstrate the advantages of the proposed cost function using the MCC as compared to the MSE. This work also investigates the influence of the kernel size on the performance of the MCC in the BP algorithm, since it is the only free parameter of correntropy.

#### 1. Introduction

System identification is a modeling procedure where the mathematical representation of the input-output maps for dynamical systems can be obtained with the aid of experimental data. This procedure is a prominent alternative for the efficient modeling of complex systems without the need for using complex mathematical concepts. For this reason, this system identification plays an important role in some control engineering related tasks such as classification and decision making, monitoring, control, and prediction [1–8].

Artificial Neural Networks (ANNs) represent one of the most successful identification techniques used to model nonlinear dynamical systems [9]. This is due to their ability to learn by examples associated with intrinsic robustness and nonlinear characteristics [10–13]. Recently, a wide variety of network structures have been used to model the input-output maps of nonlinear systems [5, 14, 15]. Multilayer Perceptron (MLP), Radial Basis Function (RBF) network, Neurofuzzy Hybrid Structures, for example, Adaptive Neurofuzzy Inference Systems (ANFIS), and Wavelet Neural Networks (WNN) are examples of ANNs commonly used in applications involving nonlinear systems [9, 13, 16, 17].

WNNs combine the flexibility of ANNs and the curve fitting ability of wavelet functions [18–20]. Besides, it can be improved in terms of extending the domain of validity by the addition of an extra layer of fuzzy structures to achieve the course delimitation of the universe of discourse, resulting in Fuzzy Wavelet Neural Networks (FWNNs) [5]. The architecture of the FWNN is very close to the traditional ANFIS [21], although wavelets are used as membership functions (MFs) [22, 23], or in the consequent part of fuzzy rules, through the use of WNNs as local models. In literature, it is often possible to find several research works applying FWNN to deal with modeling, control, function approximation, and nonlinear system identification, among others [6, 24–28].

In [29], Linhares et al. evaluate an alternative FWNN structure to identify the nonlinear dynamics of a multisection liquid tank. The aforementioned proposed structure is similar to the ones presented by Yilmaz and Oysal [5], Abiyev and Kaynak [6], and Lu [24]. However, the FWNN presented in [29] uses only wavelets in the consequent fuzzy rules. The wavelets in each node of the FWNN consequent layer are weighted by the activation signals of the fuzzy rules. Therefore, the local models of such FWNN are solely represented by a set of wavelet functions, which differs from [5, 6, 24]. The results presented in [29] demonstrate that the modified FWNN structure maintains the generalization capability and also other important features presented by traditional FWNNs, despite the reduction in the complexity of these structures.

In practical applications, the experimental data set used in the identification procedure is often corrupted by unknown noise and outliers. The outliers are incorrect measurements which markedly deviate from the typical ranges of other observations [30]. The main source of the outliers comes from sporadic malfunctioning of sensors and equipments [31]. The presence of noise and outliers in experimental data negatively affects the performance and reliability of the dynamical model under identification, because it tries to fit such undesired measurements [30, 32, 33]. Despite the fact that there are many outlier detection methods presented in literature, many approaches are not able to detect all the outliers. Therefore, the resulting data obtained after the application of such methods may still be contaminated with outliers [30, 31].

Generally, the learning process of the neural networks is based on a given gradient method, for example, the classical error backpropagation (BP) algorithm which uses the Mean Squared Error (MSE) as its cost function. However, the applicability of MSE to obtain a model that represents an input-output relationship is optimal only if the probability distribution function (pdf) of the errors is Gaussian [34]. However, the error distribution in most cases is nongaussian and nonlinear [8]. In literature we can find some researches that demonstrate that the use of the Maximum Correntropy Criterion (MCC) replacing the traditional MSE is an effective approach to handle the problem of prediction and identification when the dynamical system has unknown noise and outliers [7, 8, 30, 35]. The correntropy evaluation allows the extraction of additional information from available data because such similarity measure takes into account all the moments of a probability distribution that are typically not observed by MSE [7].

In this work, the reliability of the FWNN recently proposed in [29] is evaluated when different percentages of outliers and noise contaminate the experimental data used to identify a nonlinear system. The aforementioned neural network is used to identify the dynamic relationship between the input and output of a multisection liquid tank. In order to train the FWNN, the BP algorithm is used, although the traditional MSE cost function is replaced by the Maximum Correntropy Criterion using an adaptive adjustment of its kernel size, which is the free parameter of the MCC. The obtained models using each one of the quality measures are properly evaluated and compared. Despite the advantages of correntropy over MSE, little effort has been reported towards the application of correntropy to identify nonlinear systems using neural networks [7, 8]. The results presented in this work demonstrate that the FWNN architecture proposed in [29] is less sensitive to the presence of outliers and noise when it is trained using the MCC. In addition, this work also investigates the influence of the kernel size on the performance of the MCC in BP algorithm.

This paper is organized as follows. Section 2 presents the definition and the basic mathematical theory of the similarity measure of correntropy. Then, Section 3 describes the FWNN proposed in [29], which is applied in this work to identify an experimental nonlinear dynamical system considering the presence of outliers and noise. Section 4 presents the updating equations of BP algorithm, which are modified according to the MCC. Section 5 describes the proposed identification architecture in detail. Section 6 presents the multisection liquid tank under study, while the performance of FWNN models obtained using MSE and MCC cost functions is evaluated, considering the presence of both outliers and noise in experimental data. Finally, concluding remarks are given in Section 7.

#### 2. Correntropy

Correntropy is a generalized similarity measure between two arbitrary scalar random variables and defined by [36]where is the joint probability distribution, is the expectation operator, and is a symmetric positive definite kernel. In this work, is a Gaussian kernel given aswhere is the variance defined as the kernel size. The kernel size may be interpreted as the resolution for which correntropy measures similarity in a space with characteristics of high dimensionality [36].

By applying a Taylor series expansion to the Gaussian function in (1) and assuming that all the moments of the joint pdf are finite, such equation becomes

In practice, the joint pdf in (1) is unknown and only a finite amount of data is available, leading to the sample correntropy estimator defined by

Correntropy involves all the even moments of difference between and . Compared with MSE which is a quadradic function in the joint input space, correntropy includes second-order and higher-order statistical information [37]. However, for sufficiently large values of , the second-order moment is predominant and the measure approaches correlation [38].

Nowadays, correntropy has been successfully used in a wide variety of applications where the signals are non-Gaussian or nonlinear, for example, automatic modulation classification [39], classification systems of pathological voices [40], and principal component analysis (PCA) [41].

##### 2.1. Maximum Correntropy Criterion for Model Estimation

The correntropy concept can be extended to the model estimation. The variable can be considered as a mathematical expression of the unknown function , where is an input set and the model parameters are , which approximates the dependence on an output set [42].

Therefore, it is possible to determine the optimal solution for the MCC from (4) as [43]where and , which are the errors generated by the model during the supervised learning for each of the training samples. It is worth mentioning that such criterion is used as the cost function of the BP algorithm to adjust the parameters of the FWNN.

One of the advantages of using correntropy in system identification lies in the robustness of such measure against impulsive noise due to the use of the Gaussian kernel in (5), which is close to zero; that is, when or is an outlier. Correntropy is positive and bounded, and it gives for the Gaussian kernel.

The Gaussian variance (also called kernel size) is a free parameter that must be selected by the user [38]. Therefore, when the correntropy is estimated, the resulting values depend on the selected kernel size. In addition, the kernel size of correntropy influences the nature of the performance surface, presence of local optima, rate of convergence, and robustness to impulsive noise during adaption [37, 43]. If the training data size is not large enough, the kernel size must be chosen considering tradeoffs between outlier rejection and estimation efficiency [44].

Some approaches can be employed to determine the kernel size, for example, the statistical method [45], Silverman’s rule [46], cross validation techniques [47, 48], and shape of the prediction error distribution [44]. This work uses an adaptive kernel size algorithm [42], which is given by

In order to assess the improved performance of an adaptive kernel size over fixed ones, Section 6 is supposed to show how the error evolves during the FWNN training for different values of the kernel size.

#### 3. Fuzzy Wavelet Neural Networks

##### 3.1. Brief Review

Wavelets are obtained by scaling and translating a special function localized in both time/space and frequency called mother wavelet, which can be defined in such a way to serve as a basis to describe other functions. Wavelets are extensively used in the fields of signal analysis, identification and control of dynamical systems, computer vision, and computer graphics, among other applications [49–52]. Given , the corresponding family of wavelets is obtained bywhere and is obtained from by scaling it by a factor and translating it by .

A WNN is a nonlinear regression structure that can represent input-output maps by combining wavelets with appropriate scalings and translations [53]. The output of a WNN is determined as follows:where are the synaptic weights, is the input vector, and and are parameters characterizing the wavelets.

In a concise manner, the purpose of FWNNs is to incorporate WNNs into the ANFIS structure in order to obtain faster convergence and better approximation capabilities, eventually with a greater number of parameters to be adjusted. The fuzzy rules allow tackling the uncertainties, while wavelets contribute to improving the accuracy in the process of approximating input-output maps [6].

##### 3.2. FWNN Architecture

A particular instance of FWNN proposed in [29] is applied in this work to identify a real nonlinear system, investigating its performance and reliability when the experimental data set is corrupted by unknown noise and outliers. In this FWNN architecture, the consequent part of its fuzzy rules is described only by wavelet functions. It differs from other structures such as those proposed in [5, 6, 24]. The basic architecture of the FWNN can be seen in Figure 1 and its layers are described as follows.