Abstract

Dynamic magnetic resonance images (DMRIs) are one of the major tools for diagnosing nasal tumors in recent years. The purpose of this research is to propose a new method to be able to automatically detect tumor region and compare three classifiers' tumor detection performance for DMRI. These three classifiers are AdaBoost, SVM, and Bayes-Gaussian classifier. Three measurable metrics, sensitivity, specificity, accuracy values, match percent, and correspondence ratio, are used for evaluation of each specific classifiers. The experimental results show that SVM has the best sensitivity value, and Bayesian classifier has the best specificity and accuracy values. Moreover, the detected tumor regions that are marked with red color are shown by using each of these three classifiers.

1. Introduction

MRI provides a variety of different cross-section digital images, showing the structure of the nasopharynx and the source of the disease. Nasopharyngeal carcinoma (NPC) is a malignant tumor, located in the skull base and often occurs in South Asia. The capacity of the tumor for the NPC is a major harbinger of indicators. Therefore, it is necessary to estimate the capacity of the tumor. DMRI is a major nasal tumor detection tool, which has been widely used for radiation research [1]. Huang et al. [2] apply an RSI (relative signal increase) curve to identify these recurrent tumors. Hsu et al. [3] use pharmacokinetic analysis to identify NPC region. Zhou et al. [4] propose a fuzzy clustering method to filter out normal tissue region in T1-weighted (T1W) and contrast-enhanced T1W (CET1W) images. Lee et al. [5] uses Bayesian probability calculation and local histogram in T2-weighted images, compared to T1W’s intensity variety to detect NPC region. Huang and Chang [6] use Fuzzy C-means (FCM) [7] and grey prediction to separate the tumor and normal tissues. Zhou et al. [8] use knowledge-based fuzzy clustering (KBFC), maximum likelihood, and seed growing to identify tumor region and analyze each method’s accuracy.

Support Vector Machines (SVMs) [9, 10] have been widely used in tumor segmentation. Zhang et al. [11] compare the tumor segmentation results with one-class SVM and two-class SVM. They use MP (Match Percent) and CR (Correspondence Ratio) to evaluate their performance. The results showed that one-class SVM is superior to two-class SVM. Zhou et al. [12] use two-class SVM and kernel trick [13] to derive a new algorithm called query-based two-class SVM classifier, which is better than traditional MLP-based classifier [14]. It is available for radiologist to use as a preoperative diagnostic tool. Ritthipravat et al. [15] use region growing method and probabilistic map to find some candidate tumor region. Zhou et al. [12] also mentioned that so far, in the current diagnostic imaging or radiation therapy, a radiologist or radiation therapists need to manually describe the scope of the tumor. Therefore, we propose this new algorithm to be able to detect tumor and draw the candidate regions automatically for radiologists.

2. System Framework

In the training process, a tumor ROI is extracted by hand as a ground truth. Therefore, we have two parts of data—normal and tumor. Three classifiers, AdaBoost, SVM, and Bayes-Gaussian, are used to classify these two groups of data. After training process, the proposed system is capable of distinguishing tumor region automatically and the performance of each classifier is provided.

2.1. System Framework

The dynamic nuclear magnetism radiography (DMRI) [16] is a sequence of MRI by injecting Gadolinium developing agent into patient’s nasal region after every 0, 5, 30, 60, 120, 300, … seconds. After injecting Gadolinium developing agent, the gray level intensity of patient’s tumor spot will be gradually increased, finally reaching a stable state. This continuous MRI-scanned technology, making the tumor region and normal tissues have different gray-scale value changes. The changes in the tumor region than normal tissue are larger as well as faster. According to this characteristic we can do a preliminary observation of tumor’s size and region. However, it is very difficult to identify the tumor region by the naked eye. Therefore, the development of an effective detection system to help doctors make the diagnosis is necessary. Each MRI image has pixels. This research used 6 groups of patient’s materials to make the training and the testing. The numbers of slices of each MRI group are not all the same. Table 1 shows these 6 groups of materials, respectively, by E1322, E1563, E1971, E2074, E6618, and E6632.

As the reagent, after injection, tumor location at different times with the grayscale value of image intensity will be different. Therefore, we will first have to act according to these DMRI intensity difference. Our method definition is as follows: is the MRI which is obtained in the different time spots, but is in the DMRI’s first chart (before injecting agents).

As the tumor over time made some of grayscale intensity values increased, while the distribution of grayscale value is between the ranges of 0 and 255 where 0 represents the black and 255 is white. A threshold value is set up according to histogram distribution. The purpose of setting up threshold is to remove background and some unlikely tumor regions, therefore, to increase the detection accuracy. After computing the intensity difference between the th DMRI and the first DMRI, consecutive stacks of images are shown as

Then ROI is chosen. ROI is the tumor area. Otherwise, the region belongs to a normal tissue area. These two distinct data are used for training and testing by several classification tools based on the ground truth [16]. Some evaluation parameters are used for the accuracy confirmation. Overall, the flowchart of the system framework is shown in Figure 1.

2.2. Evaluation Parameters

This paper has used sensitivity, specificity, accuracy, MP (Match Percent), and CR (correspondence ratio) as five kinds of evaluation parameters. They are defined as follows: where positive represents the tumor region and negative represents the normal tissue. TP, FP, TN, FN, and GT definitions are as follows: TP: true positive when system determines a tumor region in fact as tumor region, FP: false positive when system determines what is not tumor region actually as a tumor region), TN: true negative when system determines what is normal tissue in fact as normal tissue), FN: false negative system when determines what is not normal tissue in fact as normal tissue), and GT: ground truth (standard tumor region, doctor thought there is a tumor region).

2.3. AdaBoost

AdaBoost is evolved by Boosting. Boosting means the promotion and is proposed by Valiant [17]. It belongs to PAC (Probably Approximately Correct) architecture of machine learning domain. Kearns and Valiant [18] proposed a weak learning ability concept and has improved the concept of Valiant. The weak learning ability concept is that, after several iterations, a weak learner with the voting mechanism (majority vote) is better than a strong learner. Such a concept is confirmed by Schapire [19]. Freund and Schapire [20] unify the above concept to propose the AdaBoost method. AdaBoost is that in the training process after several iterations, a weak learner becomes a strong learner by constantly upgrading.

In the training process, given an equal weight, first result is obtained through a weak learner. Then after voting mechanism, the misclassified data are given bigger weight and the correct classified data are given lower weight. In this way, those data which have worse classifier are forced to have more training. Therefore, a training error is gradually decreased through several iterations. Eventually, a weak learner becomes a strong learner. Many recent AdaBoost researches focus on the choice of weak learner. According to many research reports, the accuracy of weak learner cannot be too high, about 50% is the maximum. Leshem and Ritov [21] proposed a new AdaBoost method was treated as the weak learner with Random Forests [22], such algorithm achieves very good classified effect and reduces classified error rate. We adopt Leshem’s method [21] as the AdaBoost classifier in this paper.

2.4. Support Vector Machines (SVMs)

Support vector machines are proposed by Cortes and Vapnik [10] and are based on the statistical learning theory, mainly with binary classification. It is also widely used in a variety of practical problems, such as pattern recognition, document classification, and biological information. The concept of SVM is through the data to construct an optimal hyper plane. This hyper plane is used as an interface to classify two different groups of data. In recent years the related SVM research focuses on searching for the parameter of cost and gamma, because parameters with the correct classification rate have a great relationship. Lin et al. [23] proposed new SVM classified tool (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#4). Libsvm improves the original SVM concept and provides the tool to search for the best SVM’s parameter automatically. Libsvm achieves the extremely good classified accuracy. Therefore, we use Libsvm to carry on the classification to work and to analyze the result.

2.5. Bayes Classifier for Gaussian Pattern Classes

In medical image segmentation, Bayes classifier is widely used. It is originally from Bayesian theorem and is used to judge the unknown category through the probability of statistical analysis to minimize a classification error. The use of supervised learning, classification must be prior knowledge of classification of patterns, and training through the training sample study, to effectively deal with the future data classification. It is based on Bayesian theorem that the exchange prior and posteriori probabilities with the decision of classification characteristics among the various attributes are conditional independence assumptions, to predict the outcome of classification. The principle of Bayes classifier is to use the attribute’s relationship, through the training sample, to study the classification mechanism. Bayesian classification theory is based on the statistical principle of the classification; each type of sample comes to its average value and standard deviation. Under normal distribution, the variable with variable probability of emergence of is defined as follows [24]: where is the standard deviation and is the mean value.

3. Experimental Results

This section shows experimental results based on three classifiers: AdaBoost, SVM, and Bayes-Gaussian classifier. Table 2 shows the sensitivity, specificity and accuracy values after AdaBoost classifier. Table 3 shows the MP and CR values after AdaBoost classifier. Figure 2 shows the detection result images after AdaBoost classifier, where (a) is for ground truth, and in (b) color blue is for true positive and color red is for false negative. Table 4 shows the sensitivity, specificity and accuracy values after SVM classifier. Table 5 shows the MP and CR values after SVM classifier. Table 6 shows the sensitivity, specificity, and accuracy values after Bayes-Gaussian classifier. Figure 3 show the detection result images after SVM classifier, where (a) is for ground truth, and in (b) color blue is for true positive and color red is for false negative. Table 7 shows the MP and CR values after Bayes-Gaussian classifier. Figure 4 shows the detection images after Bayes-Gaussian classifier, where (a) is for ground truth (b) shows the detection result by color white, and (c) color blue is for true positive and color red is for false negative.

4. Conclusions

The probability of tumor recurrence in nasopharyngeal carcinoma is high, and relapse after the effects of the use of radiation therapy is far less than that used for the first time. Through the use of surgical resection, preoperative diagnosis becomes extremely important. This study proposed a new method and used this method in the data of multiple operations and training, testing, then to improve the tumor identification rate. It is based on this foundation to build the medical use of nasopharyngeal tumor recognition system, to ensure compatibility with the physician for the disease diagnosis, treatment, and pathology monitoring. This study proposes AdaBoost algorithm, support vector machines, and Bayesian classification classifiers to distinguish the tumor region with the normal region. The tumor detecting validation comparisons on the three classifiers are through MP value, CR value, sensitivity, specificity, and accuracy of five parameters. Final results show that SVM has the best sensitivity value, and Bayesian classifier has the best specificity and accuracy values. Some morphology techniques, such as open and close, are used to make a more complete regional cancer identification. The proposed system has no intention to replace the physician’s diagnosis job. By using visual aids of this recognition system the physicians’ misjudgment to enhance the overall diagnosis accuracy reduces.