Research Article  Open Access
Statistical Fractal Models Based on GNDPCA and Its Application on Classification of Liver Diseases
Abstract
A new method is proposed to establish the statistical fractal model for liver diseases classification. Firstly, the fractal theory is used to construct the highorder tensor, and then Generalized dimensional Principal Component Analysis (GNDPCA) is used to establish the statistical fractal model and select the feature from the region of liver; at the same time different features have different weights, and finally, Support Vector Machine Optimized Ant Colony (ACOSVM) algorithm is used to establish the classifier for the recognition of liver disease. In order to verify the effectiveness of the proposed method, PCA eigenface method and normal SVM method are chosen as the contrast methods. The experimental results show that the proposed method can reconstruct liver volume better and improve the classification accuracy of liver diseases.
1. Introduction
Liver cancer is a common disease in daily life and is often diagnosed when it is advanced, and very few liver cancer patients can be cured. So it is necessary for us to diagnose liver cancer as early as possible. Computer Aided Diagnosis (CAD) technology is established with the development of computer graphics technology, image processing technology, and pattern recognition technology. Since CT images increase the burden of doctors, research of this kind of technology is urgently needed. In recent years, scientists have researched several typical variable models such as Snake Model. These methods are more suitable for objects with smooth boundary and do not use the valuable prior knowledge. Active Texture Model (ATM) which is evolved from them can reflect the texture feature of the object. The roughness of surface in medical images is an important factor to distinguish among lesions, so ATM cannot represent object features well only with the texture model of gray feature.
In order to solve the problems as above, we proposed the statistical fractal model based on the feature of gray level and fractal dimension. The statistical fractal model can be better used in the analysis of medical images such as diseases recognition, but the construction of statistical appearance model is a challenging task when the number of training samples is much fewer than the number of dimensions of data.
Principal Component Analysis (PCA) method [1] is a famous method used in the subspace recognition, and it is one of the classical methods based on statistical feature. But this method has two problems. The first is that the original space structure of image is damaged in the vectorization process. The second is that it may cause the dimension disaster when we transfer the image into a vector. So we need more large space to calculate the covariance matrix of images. In order to solve these problems, we use the Generalized dimensional PCA [2] to learn subspace in this paper.
Support Vector Machine (SVM) [3] is commonly used to train a classifier. And the factor to affect the classification performance is the parameters used in SVM. So we use Ant Colony Optimization (ACO) algorithm to optimize SVM parameters, and then we use Directed Acyclic Graph DAG [4] to multiclassify liver diseases.
As above, for protecting the special space structure information of liver images and solving the dimension disaster problem, we extracted the gray feature and fractal feature to establish the highorder tensor of liver volume and constructed the statistical fractal model with GNDPCA method. For improving recognition accuracy, SVM optimized by ACO (ACOSVM) was used to recognize liver diseases images.
This paper is organized as follows. Section 2 introduces the proposed method; firstly, we will introduce some knowledge of PCA and tensor, then we will show the construction of highorder tensor, and finally we introduce the method of GNDPCA for the construction of statistical fractal model and ACOSVM [5] for classification. In Section 3, we present the construction of liver images after GNDPCA and the results of classification. Section 4 concludes the works in this paper.
2. Materials and Methods
In this section, we will introduce some background knowledge about GNDPCA method firstly. We mainly present the method of PCA, 2DPCA, and NDPCA and the basic knowledge of tensor. And then we will introduce our method of construction of statistical fractal model. The main flow is shown in Figure 1. The process of the proposed method is described as follows.
2.1. PCA Method and Its Extension
PCA is an application of  conversion in statistics. The purpose of PCA is to lower the dimension of data through finding a linear mapping. The mapping meets the following conditions. (1)The error of sample reconstruction is minimized.(2)The mapping of sample set in low dimension space has the maximum variance. (3)The correlation among samples is erased.
Turk and Pentland proposed the famous method named eigenface to realize PCA. Suppose that we have training samples, . Firstly, we transfer these samples into vectors shown in Figure 2. The image is transferred into a column vector; that is to say, the training samples are transferred into . is a column vector with the dimension of . Each is in the space of dimension. According to the knowledge of linear algebraic, can be expressed by basis in the dimension space. If we express by only one vector, obviously, we should use the average value of . We rename as the zerodimension expression of sample datasets. Doing as above is useful and easy, but the shortcoming of it is that it can not show the difference between samples. So the second step of PCA is to centralize the training sample sets , and then we find the 1dimension to dimension expression of the new sample sets.
Compared with PCA, 2DPCA uses the 2dimension image matrix directly for feature extraction. It can calculate the covariance matrix accurately with less time. Imagining that there is an dimension column vector which is normalized, we can project any image to it, , and get the dimension image eigenvector. The separating capacity of can be measured by the total divergence of the projected samples, and the total divergence of projected samples can be expressed by the trace of the covariance of the reflected eigenvector. , is the covariance of the projected eigenvector of training sample, and is the trace of . The purpose of maximization of is to find the mapping direction, and the final total divergence of mapping sample is the largest.
The advantage of 2DPCA method is that we do not need to transform images into vectors, and we can use the images themselves directly to deal with data information and find a group of basis which can express the original samples best. Moreover, the eigenfactor is a matrix not a vector which PCA method needs. It keeps the space structure of the original images. And it does not only wipe off the correlation between the samples effectively but also wipe off the correlation between the rows in one sample. But the method has shortages too, and the mapping coefficient matrix is large and wastes lots of memory space because of the ignorance of the difference between the columns in one sample.
Alternative 2DPCA is proposed to overcome these problems as above. The method can solve the problem of ignorance of the difference between the columns in one sample but also cannot solve the problem of the large coefficient matrix and the difference between both columns and rows. As a result, the G2DPCA method is proposed, and this method considers the correlation between both columns and rows. The mapping function is , and it can be seen as mapping to the rows first and then to the columns or to the columns first and then to the rows. At the same time the iteration ideology is proposed by G2DPCA to obtain better results.
NDPCA is proposed for modeling of highdimension data. This method is based on HOSVD. At the same time, we treat the data as a highdimension tensor. The method can solve the problem of high cost effectively, but it also has a large coefficient matrix as 2DPCA method.
2.2. The Basic Knowledge of Tensor
Tensor can be treated as the expansion of matrix. Vector is a firstorder tensor and matrix is a secondorder tensor. So if we stack up several matrixes with the same dimension, we obtain the cubic array named thirdorder tensor. The analysis of highorder tensor uses the math operation as follows [6].
Suppose that is an order tensor, , and is the dimension of tensor . The element of is defined as , for . The tensor product is defined as follows:
We can transfer the order tensor to a matrix by extending the th vector of tensor and put others after the . The product function of tensor and matrix is shown as follows:
2.3. Construction of HighOrder Tensor
In this paper, we construct highorder tensors based on fractal theory. Firstly, we use the method of box [7] and blanked [8] to calculate 4 groups of fractal feature, and then we establish highorder tensors based on the fractal feature and the texture feature pointing to each pixel.
2.3.1. The Calculation of Fractal Feature
We use the method of blanket and box to calculate the fractal feature of liver images which are segmented by the doctor. The liver image and its segmentation result are shown in Figure 3.
(a) Original image
(b) Segmentation result
The first fractal feature is obtained by the blanket method. Firstly, we treat the images as a hilly terrain surface whose height from the normal ground is proportional to the gray level of the images. Then all points at distance from the surface on both sides create a blanket whose thickness is . The estimated surface area is the volume of blanket divided by . For different , the blanket area can be iteratively estimated as follows. The covering blanket is defined by its upper surface and the lower surface , and we provide the gray level function , , for . Blanket surfaces are defined as follows:
The volume of the blanket is defined as follows:
The surface area can be defined as follows:
At last, the fractal feature can be described as (6), and is the volume of the blanket:
The other fractal features are obtained by the method of box. It is to treat the gray level image as a box in 3dimensional fractal curves. The image can be separated into several boxes , . is the gray level of the images; the plane surface can be separated into several grids. The maximum level and the minimum level of gray level of the image in the grid can be treated as the th and the 1st box, , and then we calculate the total number , and the fractal feature can be defined as (7). In this paper, we obtain to by giving different numbers of the boxes such as 4, 8, and 16:
There is a big texture difference in coarse level between different liver images, but the fractal dimension has a small change in smoothfaced images and a large change in shaggy images. So the fractal dimension is a useful feature for liver diseases classification.
2.3.2. The Construction of HighOrder Tensor
In this paper, we use 50 groups of liver images of 512 × 512. After we extract four kinds of fractal features, we extract the texture features. We use all features we obtained to establish the highorder tensors.
2.4. The Construction of Statistical Fractal Dimension Based on GNDPCA
We provide a series of zeromean value order tensor . And we need to gain a group of new order tensor (), and needs to be closed to the original tensor as much as possible. Then we define tensor images by the texture and fractal features obtained from the segmented liver images. We use Tucker model [9] to reconstruct order tensor by , . The reconstruction of threeorder tensor is shown in Figure 4.
The orthogonal matrix can be obtained by minimizing the cost function , which is shown as (8).
In , , is the number of samples. is the reconstructed tensor. There are two methods to minimize the cost function:
The first is to minimize the cost function directly, and we can calculate the orthogonal matrix by the function . But it is difficult to calculate the function. The second method is to maximize shown as (10), and it is easier to calculate. In this paper, we used the second method:
2.5. Construction of the Classification of Liver Diseases
In this paper, ACO is used to optimize SVM to train a liver diseases classifier. DAG structure for multiclassification is used to distinguish liver diseases.
2.5.1. Feature Selection Based on Liver Statistical Fractal Model
The samples consist of the core tensor of each tensor. We transfer the core tensor into a onedimensional vector using the method of nonlinear data dimensionality reduction [10]. The training set is ( is the total number of the samples), and the set of features is ( is the total number of the features).
2.5.2. Feature Weighed
The number of gray level features we select is too much, and the number of fractal features is fewer than it. So we give a higher weight to the fractal features. A series of experiments showed that the classification accuracy is much better when the weight of fractal feature is 0.6 and the weight of gray level feature is 0.4.
2.5.3. Construction of Classifier Based on ACOSVM
Some diseases such as cirrhosis and hepatic cyst are different from cancer. They are usually confused with cancer in CAD. SVM is always used in binary classification. If we want to classify 4 kinds of liver diseases using SVM, we should combine several SVMs. In this paper we use the method of directed acyclic graph (DAGSVM ()) to realize the multiclassification of liver diseases, and DAG is shown in Figure 5.
If we classify 4 kinds of liver diseases, we should use 6 SVMs. is the penalty factor, and is the parameter of kernel function. In order to optimize these two parameters by ACO algorithm, and must be discretized firstly. In this paper, the two parameters are discretized according to effective bits which are determined by experiences. The parameter and has five effective bits, respectively. The value of each bit can be varied from 0 to 9. For , its top digit is hundreds place, so its value ranges from 0 to 999.99. While for , its top digit is ones place, and thus its value ranges from 0 to 9.9999.
Then heuristic information is set to 1. Classification accuracy is used to evaluate SVM performance, and therefore is used in the global update process. Here is pheromone intensity and is maximal classification accuracy in each cycle. The whole process is executed as follows.
Step 1. Discretizing parameters and by the method as above.
Step 2. Initializing pheromone and pheromone increment .
Step 3. Executing search process for the first best path.(1)Laying ants at the origin of coordinates. (2)Putting each ant to next city whose coordinate is different from the previous visited cities randomly. (3)Modifying pheromone of transfer path for each ant according to local update rule. (4)Modifying pheromone of the path for the best ant according to global update rule if all the ants finish visiting 10 nodes, else returning to .
Step 4. Laying ants at coordinate origin again.
Step 5. Putting each ant to the next city chosen according to state transition rule.
Step 6. Modifying pheromone of transfer path for each ant according to local update rule. If ants finish tour, we jump to Step 7; otherwise we return to Step 5.
Step 7. Training a SVM classifier with and obtained by each ant. We find out the best ant which produced the highest accuracy and modify pheromone for the best ant according to global update rule. If the accuracy meets termination condition or the times of loop are bigger than the maximum cycle times, we jump to Step 8; otherwise we return to Step 4.
Step 8. Outputting best , and maximum accuracy.
3. Results and Discussion
We select 120 groups of liver images, 60 groups are normal liver, 20 groups are cirrhosis liver volume, 20 groups are cancer liver volume, and 20 groups are hydatoncus liver volume. There are 50 images in each group. The thickness of each image is 3 mm, and the resolution is 512 × 512. In 120 groups of images, we selected a half as training samples, the others as testing samples.
3.1. Reconstruction Results after GNDPCA
In this paper, we use leaveoneout method to test the generalization ability of models constructed without fractal features for liver volumes. One of all images is shown in Figure 6. The location of tumor is in the lower left corner of the liver image. Firstly, one volume is excluded from the training data which is used for the construction of the model, and then it is reconstructed by the training models for checking.
The volume is reconstructed from 5 × 5 × 3 to 300 × 300 × 30 which is shown in Figure 7. In Figure 7, the first row is reconstruction of slice 3, the second row is reconstruction of slice 13, and the third row is reconstruction of slice 23. Column (a) is original liver image, column (b) is that the dimension of modesubspace is 5 × 5 × 3, column (c) is 100 × 100 × 10, column (d) is 200 × 200 × 20, column (e) is 300 × 300 × 30, and column (f) is the reconstructed volume using eigenface by PCA as the contrastive method.
(a)
(b)
(c)
(d)
(e)
(f)
Since the dimension of the original volume is 512 × 512 × 50, we can calculate the compressing rate for all cases. The compressing rate is 0.0006%, 0.7629%, 6.1035%, and 20.5994%. With the growth of the dimension of modesubspace, reconstruction result is better. Because of overfitting, the method of PCA is worse than GNDPCA.
It needs less iteration times using GNDPCA which is shown in Figure 8, and the value of the cost function does not dramatically change after two iterations. Therefore, we set the iteration times of GNDPCA as two in our experiment.
In Figure 9, it shows the relationship between original volume and the reconstructed volume. Abscissa a is the modesubspace of 5 × 5 × 3, b is 100 × 100 × 10, c is 200 × 200 × 20, d is 300 × 300 × 30, e is 400 × 400 × 40, and f is 512 × 512 × 50.
The normalized correlation grows with the growth of modesubspace size. When the modesubspace size is 512 × 512 × 50, the normalized correlation is 1. It means that we can reconstruct the original volume without any errors. The normalized correlation can be defined as (11). is the original tensor volume, and is the volume after reconstruction:
3.2. Results of Classification
The result of each SVM in ACOSVM multiclassifier is shown in Table 1. From the table we can see that the statistical fractal model has better accuracy than the statistical texture model without fractal feature.

Compared with other classifier, ACOSVM with the weighed fractal feature has better accuracy which is shown in Figure 10. Classifier BPNN is BP neural network, and the accuracy is 69.23%. Classifier FL is Fisher linear classifier with 46.23% accuracy. Classifier KNN is kNearest Neighbor algorithm whose accuracy is 47.23%. Classifier SVM is the conventional SVM with 62.68% accuracy. Classifier ACOSVM is the conventional ACOSVM whose accuracy is 89.87%. Classifier FACOSVM is the method which is ACOSVM with fractal features, and the accuracy is 91.43%. Classifier WFACOSVM is ACOSVM with weighed fractal feature; the accuracy is 93.06%. As Figure 10 shows, ACOSVM does better than others in classification. And when we use weighed fractal feature in our statistical fractal model, we can reach a better accuracy in liver diseases classification.
4. Conclusions
In this paper, we have presented the construction of highorder tensors with weighed fractal dimension feature and gray feature. And GNDPCA, which is a subspace learning method, has been used to get the core tensor from those highorder tensors and establish the statistical fractal model for the later classification. ACOSVM has been used to train a liver image classifier. As an application for classifying liver diseases, the method using statistical fractal models based on GNDPCA and ACOSVM achieved the better classification accuracy, because statistical fractal models based on GNDPCA can preserve the information of the original image as much as possible, and ACO can find the optimal parameters for SVM. In conclusion, under the condition of a small number of samples, the classifier of this paper can achieve the better recognition accuracy than others such as BPNN, the conventional SVM, and the conventional ACOSVM. Therefore the proposed method can improve the classification accuracy of liver diseases and assist doctors to diagnose liver diseases.
Acknowledgment
The research is supported by the National Natural Science Foundation of China (no. 61272176, no. 60973071).
References
 M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '91), pp. 586–591, June 1991. View at: Google Scholar
 R. Xu and Y. W. Chen, “Generalized Ndimensional principal component analysis (GNDPCA) and its application on construction of statistical appearance models for medical volumes with fewer samples,” Neurocomputing, vol. 72, no. 10–12, pp. 2276–2287, 2009. View at: Publisher Site  Google Scholar
 B. Liu, Z. F. Hao, and X. W. Yang, “Nesting support vector machinte for muticlassification [machinte read machine],” in Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC '05), vol. 7, pp. 4220–4225, August 2005. View at: Publisher Site  Google Scholar
 J. C. Platt, N. Cristianini, and T. J. Shawe, “Large margin DAGs for multiclass classification,” Advances in Neural Information Processing Systems, vol. 12, no. 3, pp. 547–553, 2000. View at: Google Scholar
 X. Liu, H. Jiang, and F. Tang, “Parameters optimization in SVM basedon ant colony optimization algorithm,” Advanced Materials Research, vol. 121122, pp. 470–475, 2010. View at: Publisher Site  Google Scholar
 D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700–1715, 2007. View at: Publisher Site  Google Scholar
 S. Peleg, J. Naor, R. Hartley, and D. Avnir, “Multiple resolution texture analysis and classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 4, pp. 518–523, 1984. View at: Google Scholar
 N. Sarkar and B. B. Chauduri, “An Efficient differential boxcounting approach to compute fractal dimension of image,” IEEE Transactions on Systems, Man and Cybernetics, vol. 24, no. 1, pp. 115–120, 1994. View at: Publisher Site  Google Scholar
 L. de Lathauwer, B. de Moor, and J. Vandewalle, “On the best rank1 and rank(R_{1}, R_{2}, . . ., R_{n}) approximation of higherorder tensors,” SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1324–1342, 2000. View at: Google Scholar
 H. Eghbalnia, A. Assadi, and J. Carew, “Nonlinear methods for clustering and reduction of dimensionality,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '99), vol. 2, pp. 1004–1009, July 1999. View at: Google Scholar
Copyright
Copyright © 2013 Huiyan Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.