Mathematical Problems in Engineering

Volume 2015, Article ID 384183, 8 pages

http://dx.doi.org/10.1155/2015/384183

## Kernel Fisher Discriminant Analysis Based on a Regularized Method for Multiclassification and Application in Lithological Identification

^{1}College of Management Science, Chengdu University of Technology, Chengdu 610059, China^{2}College of Geophysics, Chengdu University of Technology, Chengdu 610059, China

Received 19 July 2014; Revised 8 October 2014; Accepted 10 October 2014

Academic Editor: Jun Cheng

Copyright © 2015 Dejiang Luo and Aijiang Liu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This study aimed to construct a kernel Fisher discriminant analysis (KFDA) method from well logs for lithology identification purposes. KFDA, via the use of a kernel trick, greatly improves the multiclassification accuracy compared with Fisher discriminant analysis (FDA). The optimal kernel Fisher projection of KFDA can be expressed as a generalized characteristic equation. However, it is difficult to solve the characteristic equation; therefore, a regularized method is used for it. In the absence of a method to determine the value of the regularized parameter, it is often determined based on expert human experience or is specified by tests. In this paper, it is proposed to use an improved KFDA (IKFDA) to obtain the optimal regularized parameter by means of a numerical method. The approach exploits the optimal regularized parameter selection ability of KFDA to obtain improved classification results. The method is simple and not computationally complex. The IKFDA was applied to the *Iris* data sets for training and testing purposes and subsequently to lithology data sets. The experimental results illustrated that it is possible to successfully separate data that is nonlinearly separable, thereby confirming that the method is effective.

#### 1. Introduction

China’s tight clastic rock reservoir is considerably wide, containing sediments that were deposited during the Carboniferous, Permian, Triassic, and Jurassic periods. The reservoirs in Western Sichuan and Erdos are more representative. The West Sichuan depression is located in the Western Sichuan Basin, which belongs to the western depression belt that is located in Yangtze Platform or Longmenshan Fault Zone. A tight gas reservoir was discovered in the Xujiahe and Shaximiao formations that occur in this area. Tight clastic reservoirs are noted for their low porosity, because of their dense, multilayered stacking and strong heterogeneous characteristics that are caused by their complexity and particularity. These characteristics complicate the identification of the lithology, which would enable the prediction of the properties of a reservoir. Previous research [1, 2] identified the reservoir lithology consisting of mudstone, sandstone, and siltstone in the AC area of western Sichuan. The cross plot of sandstone and siltstone shows that these rock types overlap and are mixed together, which is a linearly nonseparable case. The cross plot and mathematical models have been applied extensively in lithology identification in previous studies. For example, Hsieh et al. constructed a fuzzy lithology system from well logs to identify the formation lithology [3], while Shao et al. applied an improved BP neural network algorithm, based on a momentum factor, to lithology recognition [4]. Zhang et al. used Fisher discrimination to identify volcanic lithology using regular logging data [5]. However, it is very difficult to identify the lithology of tight clastic rock reservoirs with the above method. Thus, this paper proposes the Kernel Fisher discriminant analysis (KFDA) for tight clastic rock lithology identification.

The KFDA has its roots in Fisher discriminant analysis (FDA) and is the nonlinear scheme for two-class and multiclass problems [6]. KFDA functions by mapping the low-dimensional sample space into a high-dimensional feature space, in which the FDA is subsequently conducted. The KFDA study focuses on applied and theoretical research. Billings et al. replaced the kernel matrix with its submatrix in order to simplify the computation [7, 8]; Liu et al. proposed a new criterion for KFDA to maximize the uniformity of class-pair separabilities that was evaluated by the entropy of the normalized class-pair separabilities [9]; Wang et al. considered discriminant vectors to be linear combinations of “nodes” that are part of the training samples and therefore proposed a fast kernel Fisher discriminant analysis technique [10, 11]. Wang et al. proposed the nodes to be the most representative training samples [12]. Optimal kernel selection is one of the areas of theoretical research that has been attracting considerable attention. Fung et al. have developed an iterative method based on a quadratic programming formulation of FDA [13]. Khemchandani et al. considered the problem of finding the data-dependent “optimal” kernel function via second-order cone programming [14]. The use of KFDA in combination with strong nonlinear feature extraction ability is becoming a powerful tool for solving identification or classification problems. Hence, it has been applied widely and successfully in many areas. Examples of the application of KFDA are face recognition, fault diagnosis, classification, and the prediction of the existence of hydrocarbon reservoirs [15–20].

The principle that underlies KFDA is that input data are mapped into a high-dimensional feature space by using a nonlinear function, after which FDA is used for recognition or classification in feature space. KFDA requires factorization of the Gram matrix into the kernel within-class scatter matrix and the between-class scatter matrix . KFDA can finally be attributed to the solution of a generalized eigenvalue problem . As the matrix is often singular, a regularized method is often used to solve the problem, which is transformed into a general eigenvalue problem by choosing a smaller positive number , in which case is replaced with . Previous studies have shown the classification ability of KFDA to depend on the value of ; therefore, the appropriate values for KFDA are very important. In many practical applications, the parameter is specified according to experience or experimental results.

This paper proposes a new approach for the selection of the regularized parameter to gain the best classification results, and KFDA is improved for both* Iris* data sets and lithology data sets. The paper is organized as follows. Section 2 summarizes kernel Fisher discriminant analysis. In Section 3, a numerical method for finding an optimal parameter is proposed by introducing a regularized method for KFDA. The experimental results are given in Sections 4 and 5, while Section 6 presents the concluding remarks.

#### 2. Kernel Fisher Discriminant Analysis

Let be the data set that contains classes in the -dimensional real space . Let samples belong to the th class, (). FDA is used for lithology identification by searching the optimal projection vectors , and then a different class of lithology samples has minimum within-class scatter. FDA is given by the vector that maximizes the Fisher discriminant function aswhere is the within-class scatter matrix and is the between-class scatter matrix. FDA is essentially a linear method, which makes it very difficult to separate the nonlinear separable sample.

KFDA significantly improves the classification ability for the nonlinear separable sample of FDA via the use of a kernel trick. To adapt to nonlinear cases, is mapped from the lower dimensional sample space into a high-dimensional feature space. Note that represents the th projection value in the class . Let be the mean vector of the population, and let be the mean vector of class . In the feature space , the total scatter matrix , the within-class scatter matrix , and the between-class scatter matrix can be defined as

Lithology identification by KFDA can be attributed to the optimization of kernel Fisher criterion function as follows:where represents the different optimal projection vector. The high dimension of feature space and the infinite dimension make it impossible to directly calculate the optimal discriminant vector . A solution for this problem is to use the kernel trick as follows:

According to the theory of reproducing a kernel [6], any solution must lie in the feature space , which spans as follows:In , any test samples can be projected into to give the following equation:In , the kernel within-class scatter matrix and the between-class scatter matrix can be defined asAccording to the properties of the generalized Rayleigh quotient, the optimal solution vector is obtained by maximizing the criterion function in (11) by setting it equivalent to the solution of the generalized characteristic equation as follows:

#### 3. Choosing the Regularized Parameter

If is a nonsingular matrix, then optimal vectors , obtained by maximizing (11), are equivalent to the feature vectors corresponding to the top largest eigenvalues [12, 21]. Equation (12) can be described as

The solution of practical problems requires the use of training samples to estimate the variance of -dimensional structure; therefore, is a singular matrix. This means that it is often not possible to use (13). However, it is possible to promote the stability of the numerical method by using regularized method as follows:where is a small, positive number, and is the identity matrix. Then, (12) can be expressed asWhen KFDA is used to solve problems of an applied nature, parameter is determined according to the experience or the result of the experiment. This paper uses a numerical analysis method to solve the parameter; hence, the determinant of the value of can be regarded as a function of :When function is stable and the value of the function tends to zero (17), the parameter is the best classification parameters

#### 4. Experiments

##### 4.1. Experiments Settings

The* Iris* data set is often used to test the discriminant analysis algorithm [22–24]. This data set is divided into three classes, which represent three different varieties of the* Iris* flower: C1,* Iris setosa*, C2,* Iris versicolor*, and C3,* Iris virginica*, which takes petal length, petal width, sepal length, and sepal width as four-dimensional variables. The three classes represent three different varieties of* Iris* flowers. There are 150 samples in this data set, and there are 50 samples in each class. The results are plotted as scatter grams (Figure 1) and show that classes C1 and C2 and classes C1 and C3 are linearly separable and classes C2 and C3 are nonlinearly separable. The aim was to address this problem by using the KFDA with different values of for classification purposes [25].