Frontiers in Integrative Genomics and Translational BioinformaticsView this Special Issue
Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA
A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time.
Pancreatic cancer is one of the world’s top 10 malignant tumors . Its early and accurate diagnosis is difficult. Once the diagnosis is confirmed, the tumor has reached an advanced stage. It is of great significance to improve prognosis for early detection, early diagnosis, and early treatment . With the development of computer science and computer image-processing technology, computer aided detection (CAD) technology is established. CAD systems are increasingly used as an aid by radiologists for detection and interpretation of diseases , reducing the burden of doctors and improving the diagnosis accuracy.
Image recognition is one of the most important parts of CAD technology. The recognition process is mainly divided into two phases, namely, feature extraction and selection and classifier construction. In , we argue that tensors can describe space information among image features and need less space than vectors. Multilinear principal component analysis (MPCA) method  can be used to select the core tensors. In this paper, we also use tensors to represent CT images and MPCA to select core tensors to reduce the tensor dimension.
There are many methods to establish the medical image classifier. Kovalerchuk et al.  and Pendharkar et al.  used machine learning and data mining technology in breast cancer detection. In recent years, many researchers have made thorough research on medical image classification. Antonie et al.  combined association rule and neural network to mine the texture feature in different regions of breast images and realized the automatic diagnosis of breast cancer. Zhang et al.  classified cervix uterus lymphonodus by support vector machine (SVM) and size and shape features. Ramírez et al.  proposed to use neural network method in classification of brain images of Alzheimer’s disease.
However, the research of pancreatic cancer classification is in a fledging period. Tsai and Kojma  proposed the pancreatic tiny anomaly detection method for CT images and introduced the square of logarithm operation in grayscale to enhance the margin of low grayscale. Takada et al.  proposed a new pancreatic classification system to distinguish the four parts of pancreas based on the anatomy of pancreas and their own experience. He et al.  proposed a novel group search optimizer- (GSO-) based biomarker discovery method for pancreatic cancer diagnosis using mass spectrometry data, compared with a genetic algorithm, evolution strategies, evolutionary programming, and a particle swarm optimizer and achieved better classification performance than other algorithms.
Theoretically, imaging examination of any body tissues and organs can use CAD technology to improve diagnostic accuracy. However, since the position of pancreas is covert and has complex relationship with other organs, the pancreatic cancer image classification is difficult.
In this paper, we employ SVM , which are suitable for solving small-sample learning and nonlinear and high dimension problems, to establish the pancreatic cancer classification, and improve fruit fly optimization algorithm (FOA)  to optimize parameters of SVM. We provide a new fitness function which is more in line with the actual clinical needs in -fold cross-validation to assess the classifier performance. Using the above strategies, the classification performance can be improved. Experimental results on pancreatic regions of abdominal CT images demonstrate the feasibility and efficiency of the proposed method.
This paper is organized as follows. Section 2 introduces the background of this researching, including support vector machine, fruit fly optimization algorithm, and the concept of quantum. Section 3 illustrates the method of construction of SVM classifier based on improved FOA. Section 4 presents the experimental data and the evaluation criterions, showing the results of the pancreatic cancer classification based on the improved FOA and other comparative methods. It also discusses the experiment results. Section 5 concludes the work in this paper.
We introduce SVM and FOA in this part; the concept of quantum is shown in .
2.1. Support Vector Machine
Support vector machine (SVM)  is built on statistical learning theory. It is suitable for small-sample learning and nonlinear and high dimension problem. SVM is based on the principle of structural risk minimization and has strong generalization ability. It studies optimal separating hyperplane in the high dimension feature space for sample classification.
SVM mainly aims at binary classification. For linear separable problem, we consider samples as . is the feature set of medical images, is the label of samples, and , is the number of samples. The optimal separating hyperplane is . The functional margin which is the distance from a sample point to separating hyperplane is . The geometrical margin is obtained by normalizing , and it is simplified as . The objective is to obtain the maximum value of . It is equivalent to obtain the minimum value of . Finally, the problem translates into the quadratic programming problem as in (1), where is penalty coefficient, and is slack variable.
The optimal separating function is shown as
For nonlinear problem, the kernel function is used to translate nonlinear problem in low dimensional space into linear problem in high dimensional space. The optimal separating function is shown as
Linear kernel is
Polynomial kernel is
RBF kernel is
Sigmoid kernel is
The main influencing factor of recognition performance is the parameters used in SVM. Presently the staple methods to select optimal parameters include grid search , genetic algorithm (GA) , and particle swarm optimization (PSO)  algorithm. In , Dorigo et al. proposed ant colony optimization (ACO) algorithm to select optimal parameters value, achieving better classification performance while taking more time. In [20, 21], Xu et al. and Tiwari and Vidyarthi proposed quantum genetic algorithm (QGA) to optimize SVM parameters and verified that quantum operation can increase the scope of the search space and has good searching ability. In , Jiang et al. used quantum simulated annealing (QSA) algorithm combined QGA and simulated annealing (SA) algorithm  to optimize SVM parameters, tested the classification model based upon pancreatic images, and achieved better and stable accuracy.
2.2. Fruit Fly Optimization Algorithm
The key steps of FOA are shown as follows.
Step 1. The position of population, , is randomly initialized. and are abscissa value and ordinate value of population’s position, respectively.
Step 2. For each fruit fly, the direction and position of flying are randomly evaluated. It is represented as (9). is the new position of each fruit fly, where . is the number of fruit flies in population:
Step 3. The distance from each fruit fly to the origin and the smell concentration determination value of each fruit fly are calculated as
Step 4. The smell concentration determination value is used in smell concentration determination function (fitness function) to calculate the smell concentration value as
Step 5. The fruit fly which has the best smell concentration is found in population:
Step 6. The best smell concentration and its position are saved. The fruit fly population moves to this position by vision.
Step 7. Step to Step are iterated. If the smell concentration is better than previous one, Step is executed.
FOA is one of the intelligent optimization algorithms. It is easy to set up, easy to implement, and fast to optimize. But it also has some problems. In the phase of parameter initialization, FOA uses randomized strategy to determine initial point position. In the phase of fruit fly individual position update, blind search strategy is used. It is slow to converge and easy to fall into extreme values. At present, there are a number of evaluation criteria for classifier performance. In classifier optimization algorithms, classification accuracy and error rate are always used as the fitness function. But those criteria cannot reflect clinical prior knowledge. It is simply to evaluate an operating point and not strong enough when the distribution of class is changed.
The whole procedure of the proposed method is shown in Figure 2. The detailed process of the proposed method is as follows.
(1) Feature Extraction. We extract gray and fractal dimension features from the segmented pancreatic images, and then we normalize those features.
(2) High Order Tensors Construction. High order tensors are constructed based on the extracted features to represent pancreatic images.
(3) Feature Selection. In this paper we use the MPCA method to extract the eigen tensors for classification.
(4) Pancreatic Cancer Classification. After we obtain the eigen tensors by MPCA, we can treat the eigen tensors as input samples, and then we use the approach of SVM optimized by improved FOA to train classification model of pancreatic cancer image.
In the process, high order tensors construction and feature selection are carried out in accordance with . So in this paper, we will no longer discuss them.
3.1. Improved FOA
Aiming at the existing problem of FOA, we introduce quantum to FOA and redefine a new fitness function as the smell concentration determination function.
3.1.1. Quantum Fruit Fly Coding
In improved FOA (IFOA), quantum phase is used to code fruit flies’ position. Compared with FOA which has the same number of fruit flies, the solution search space of quantum fruit flies is the double of the original fruit flies. The quantum fruit flies population position is shown as (13). When initializing, the quantum bit phase angle is , where , , and is the dimension of optimization problem. In this paper, :
3.1.2. Quantum Fruit Fly Smell Concentration Determination Value
As quantum phase is used to code fruit flies’ position, each fruit fly has two solutions, namely, the cosine solution and the sine solution. The distance from the th fruit fly to the origin and the smell concentration determination value of the th fruit fly can be calculated as where , and is the number of the fruit flies’ populations. In (15), Dist is normalized to and then assigned to . The reason is to facilitate parameters zooming for optimizing SVM.
3.1.3. Quantum Fruit Fly Smell Concentration Determination Function
False negative rate (FNR) is known as the rate of missed diagnosis. It is the percentage of actual sickness while identified as disease-free. FNR is complementary with the actual diagnostic sensitivity. False positive rate (FPR) is known as the misdiagnosis rate. It is the percentage of the actual disease-free while identified as sickness. FPR is complementary with the actual diagnostic specificity. In the process of actual disease diagnosis, if diagnosis with high sensitivity is used, the higher is the sensitivity, the less is the rate of missed diagnosis. That is to say, FNR is low. When diagnosis with high specificity is used, the misdiagnosis rate is low. That is to say, FPR is low. Therefore, in improved FOA, the mean of weighted sum of FNR and FPR in -fold cross-validation is used as the smell concentration determination function. It is shown as
In (16), is the parameter of -fold cross-validation, and is the weight of FNR. If a fruit fly has small smell concentration value, it is good.
3.1.4. Quantum Fruit Fly Mutation Operation
Quantum not gate is used to randomly change quantum fruit flies’ positions. It not only increases the diversity of the population, but also avoids precocity. The quantum not gate based on phase coding is shown as
The mutation probability of an individual fruit fly is . If is greater than a random number within , the two probability amplitudes of -coordinate or -coordinate of the individual fruit fly randomly selected will be exchanged by quantum not gate.
The acceptance probability of mutated new fruit fly position obeys the Boltzmann probability distribution. It is shown as
In (18), is the mutated new fruit fly position, is the original fruit fly position, is the smell concentration determination function, and is iterations. If , the new position will be accepted by probability 1. Otherwise, the new position will be accepted by probability .
3.2. Construction of SVM Classifier Based on Improved FOA
The framework of classifier construction and the flowchart of SVM parameter optimization based on improved FOA are shown in Figures 3 and 4, respectively. The process of classifier construction consists 3 steps, namely, obtaining classifier parameters, training classifier, and testing classifier.
The parameters of SVM, penalty factor , and RBF kernel function parameter have great influence on the performance of classifier. determines the promotion ability of SVM. The small value of represents the penalty of empirical error being small, which can lead to “underfitting study.” The large value of represents the penalty of empirical error being large, which can lead to “overfitting study.” The optimal value of is different according to different data subspace, and selecting the optimal value of can make the promotion ability better. SVM can map the input data of low dimensional space into high dimensional space by the kernel function. Vapnik  has found that the parameters of kernel function and penalty factor have great influence on the performance of SVM. So the selection of parameters of penalty factor and RBF kernel function parameter is important.
The detailed process for optimizing SVM parameters, penalty factor , and RBF kernel function parameter is as follows.
Step 1. The population position is initialized by (13).
Step 2. For each fruit fly, the position and the direction of flying are randomly evaluated. It is shown as
In (19), , , and is the number of individuals in population.
Step 4. The smell concentration determination value is zoomed to get and . It is shown as (20). and are zoom multiples of and , which can be obtained by prior knowledge:
Step 5. The smell concentration is calculated by (16). We set and will discuss the value of in the next section.
Step 7. The fruit fly which has the best smell concentration is found as (21). is the best smell concentration, is the individual fruit fly which has the best smell concentration, and is the position of best smell concentration of the individual fruit fly:
Step 8. The axes and position of the best smell concentration are saved. The fruit fly population moves to this position by vision. It is shown as
Step 9. Step ~Step are iterated. If the smell concentration is better than previous one, Step is executed. If the termination condition is satisfied, the optimum parameters will be returned.
4. Results and Discussion
4.1. Experimental Data
In this paper, abdominal CT images are used in experiments, which are provided by the radiology department of a hospital in Shenyang, China. Their resolution is pixels, the scan slice thickness is 2 mm, and the format is DICOM. For the purpose of algorithm simulation, the DICOM image is transformed into BMP image. The grayscale is 256 and the resolution is . The detailed information of dataset is shown in Table 1.
4.2. Evaluation Criteria
According to the hybrid matrix, which is shown in Table 2, the evaluation criteria are calculated. In this paper, evaluation criteria consist of False Positive Rate (FPR), False Negative Rate (FNR), Accuracy, Precision, F1 value, and the running time of the algorithms. The mean square errors of evaluation criteria in many experiments are also used to evaluate the stability of the algorithm:
4.3. Prior Knowledge
Because of the sensitivity of initial scope for parameters optimization, we make the statistical analysis for the penalty factor and RBF kernel function parameter , which obtains the prior knowledge of the parameters. The result is shown as in Figure 5.
(a) The influence of parameter
(b) The influence of parameter
From Figure 5, we can obtain the initial scope of and by QSA that is and , respectively. And the scope of optimal solution is and , respectively. The scaling of and is and , respectively.
4.4. Determination of FNR Weight
In an actual treatment, a patient was ill, but he was diagnosed as disease-free; then the treatment progress would be delayed and the cure opportunity would be reduced. On the contrary, if a patient was disease-free and was diagnosed with illness, patient would undergo further examination to make up the mistake. Therefore, it is believed that FNR is more important than FPR. The weight of FNR should be greater than FPR; that is to say, .
For different values of , the experiment was run 10 times. Then we compared the mean value of evaluation criterions and their mean square error.
(a) Mean value of
(b) Mean square error of
(a) Mean value of
(b) Mean square error of
(a) Mean value of FPR and FNR
(b) Mean square error of FPR and FNR
(a) Mean value of Accuracy, Precision, and
(b) Mean square error of Accuracy, Precision, and
From Figures 6 and 7, it can be seen that the optimized parameters are found to be in line with prior knowledge. According to the mean square error of , when , the algorithm is the most stable, and is the second. According to the mean square error of , when , the algorithm is the most stable, and is the second.
The better performance of the algorithm comes when the values of FNR and FPR are smaller. From Figure 8, for FPR, when is 0.9, its mean value and mean square error are the smallest; for FNR, when is 0.7, 0.9, or 1, its mean value and mean square error are the smallest. Greater values of Accuracy, Precision, and can lead to better performance of the algorithm. From Figure 8, when , its mean values of Accuracy, Precision, and are the greatest, and the mean square error is the smallest. Therefore, the final value of is determined as 0.9.
4.5. Experimental Results and Analysis
Ten experiments of SVM optimized by improved FOA (IFOA-SVM) are randomly done. The experimental result is shown in Table 3.
From Table 3, it is known that the mean values of and are 0.79541 and 1167.99183, respectively. The average of FPR is 4%, FNR is 0, Accuracy is 97.14%, Precision is 91.41%, is 95.39%, and the time is 31.197 s.
Compared with other classifiers, the performance of IFOA-SVM is better as shown in Figures 10 and 11. In Figure 12, the comparison of running time is shown. Classifier Fisher is the Fisher linear classifier, classifier BPNN is the BP neural network, SVM is the common SVM, and ACO-SVM, FOA-SVM, and QSA-SVM are the optimized classifier SVM using ant colony algorithm, fruit fly optimal algorithm, and quantum simulated annealing, respectively. IFOA-SVM is the proposed method.
(a) Mean value of FPR and FNR for different classifiers
(b) Mean square error of FPR and FNR for different classifiers
(a) Mean value of Precision, F1, and Accuracy for different classifiers
(b) Precision, F1, and Accuracy mean square errors for different classifiers
In Figure 10, FNR achieved 100% and FPR is 0 by SVM and ACO-SVM; that is to say, all patients are diagnosed free from diseases. This situation is not allowed in actual diagnosis. FNR of BPNN and Fisher are 88.75% and 56.25%, and FPR are 60% and 49.5%. So BPNN and Fisher lack credibility. FPR and FNR of FOA-SVM are 0 and 35%. It sometimes occurs in missed diagnosis situation. FPR and FNR of QSA-SVM are 6% and 5%. It might occur in missed diagnosis to few patients. FPR and FNR of the proposed IFOA-SVM are 4% and 0. It is better than other methods. IFOA-SVM achieves the best sensibility and stability.
In Figure 11, for the average of Precision, FOA-SVM is the best, which is 100%. IFOA-SVM takes the second place, which is 91.41%. And from Figure 11(b), FOA-SVM is the most stable. For the average of F1, the proposed IFOA-SVM achieves 95.39%, which is optimal. QSA-SVM is 90.44%, which takes the second place. But IFOA-SVM is more stable than QSA-SVM. For the average of Accuracy, IFOA-SVM is 97.14%, which is the best. QSA-SVM is 94.29%. FOA-SVM is 90%. SVM and ACO-SVM are 71.43%. Fisher and BPNN are less than 50%. Compared with mean square error of Accuracy, IFOA-SVM is the most stable.
In Figure 12, Fisher and SVM cost the least time, which is 0.03 s. BPNN is 1.244 s. FOA-SVM is 14.142 s. IFOA-SVM is 31.197 s. QSA-SVM is 82.186 s. ACO-SVM cost the most time, which is 247.092 s. In actual diagnosis, less time is better. The proposed IFOA-SVM is not the best but is not out of the way.
ACO-SVM, FOA-SVM, QSA-SVM, and the proposed IFOA-SVM can be used to optimize SVM parameters. In Figures 13 and 14, comparative results of mean value and mean square error of optimal parameters, and , are shown based on those methods.
(a) Mean value of C
(b) Mean square error of C
(a) Mean value of
(b) Mean square error of
In the pancreatic cancer classifier based on ACO-SVM, is oversize and is undersize. By using FOA-SVM, optimal parameters are not in estimation interval of prior knowledge, but in terms of mean square error of optimal parameters FOA is stable. When QSA-SVM and IFOA-SVM are used to optimize SVM parameters, optimal parameters are in estimation interval, and the stability of two methods is similar from mean square error of . And in terms of mean square error of , IFOA-SVM is more stable than QSA-SVM.
In this paper, we introduced the concept of quantum to FOA to improve it. A new smell concentration determination function was defined in the improved FOA. The improved FOA was used to optimize the parameters of SVM and a classifier was constructed based on the optimized SVM. As an application, pancreatic cancer classifier was established. The proposed method achieved better classification performance. The first reason is that quantum coding and quantum operation increased the diversity of the population and avoided precocity. The second reason is that the redefined smell concentration determination function was more suitable to the actual diagnosis requirements. The third reason is the advantages of FOA which are easy to set up, easy to implement, and fast to optimize. Therefore, the proposed method can improve the classification performance of pancreatic cancer images and then assist doctors in diagnosing diseases.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The research is supported by the National Natural Science Foundation of China (no. 61272176).
Y. Wang and P. Zhao, “Advances in early diagnosis of pancreatic cancer,” Oncology Progress, vol. 4, no. 4, pp. 327–332, 2006.View at: Google Scholar
F. Fraioli, G. Serra, and R. Passariello, “CAD (computed-aided detection) and CADx (computer aided diagnosis) systems in identifying and characterising lung nodules on chest CT: overview of research, developments and new prospects,” Radiologia Medica, vol. 115, no. 3, pp. 385–402, 2010.View at: Publisher Site | Google Scholar
M. Antonie, O. Zaiane, and A. Coman, “Application of data mining techniques for medical image classification,” in Knowledge Discovery and Data Mining, pp. 94–101, 2001.View at: Google Scholar
J. Ramírez, R. Chaves, J. Górriz et al., “Functional brain image classification techniques for early Alzheimer disease diagnosis,” in Bioinspired Applications in Artificial and Natural Computation: Third International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2009, Santiago de Compostela, Spain, June 22–26, 2009, Proceedings, Part II, vol. 5602 of Lecture Notes in Computer Science, pp. 150–157, Springer, Berlin, Germany, 2009.View at: Publisher Site | Google Scholar
T. Takada, H. Yasuda, K. Uchiyama, H. Hasegawa, T. Iwagaki, and Y. Yamakawa, “A proposed new pancreatic classification system according to segments: operative procedure for a medial pancreatic segmentectomy,” Journal of Hepato-Biliary-Pancreatic Surgery, vol. 1, no. 3, pp. 322–325, 1994.View at: Publisher Site | Google Scholar
S. He, H. J. Cooper, D. G. Ward, X. Yao, and J. K. Heath, “Analysis of premalignant pancreatic cancer mass spectrometry data for biomarker selection using a group search optimizer,” Transactions of the Institute of Measurement and Control, vol. 34, no. 6, pp. 668–676, 2012.View at: Publisher Site | Google Scholar
V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.View at: Publisher Site
F.-H. Yu and H.-B. Liu, “Structural damage identification by support vector machine and particle swarm algorithm,” Journal of Jilin University Engineering and Technology Edition, vol. 38, no. 2, pp. 434–438, 2008.View at: Google Scholar
X. Xu, J. Jiang, J. Jie, H. Wang, and W. Wang, “An improved real coded quantum genetic algorithm and its applications,” in Proceedings of the International Conference on Computational Aspects of Social Networks (CASoN '10), pp. 307–310, IEEE, Taiyuan, China, September 2010.View at: Publisher Site | Google Scholar
S. Xu, H. Zhao, and Y. Xie, “Grey SVM with simulated annealing algorithms in patent application filings forecasting,” in Proceedings of the International Conference on Computational Intelligence and Security Workshops (CIS '07), pp. 850–853, IEEE, Harbin, China, December 2007.View at: Publisher Site | Google Scholar