A Five-Level Wavelet Decomposition and Dimensional Reduction Approach for Feature Extraction and Classification of MR and CT Scan Images
This paper presents a two-dimensional wavelet based decomposition algorithm for classification of biomedical images. The two-dimensional wavelet decomposition is done up to five levels for the input images. Histograms of decomposed images are then used to form the feature set. This feature set is further reduced using probabilistic principal component analysis. The reduced set of features is then fed into either nearest neighbor algorithm or feed-forward artificial neural network, to classify images. The algorithm is compared with three other techniques in terms of accuracy. The proposed algorithm has been found better up to 3.3%, 12.75%, and 13.75% on average over the first, second, and third algorithm, respectively, using KNN and up to 6.22%, 13.9%, and 14.1% on average using ANN. The dataset used for comparison consisted of CT Scan images of lungs and MR images of heart as obtained from different sources.
Biomedical images like Magnetic Resonance Imaging (MRI), Computed Tomography- (CT-) Scan, ultrasound images, and so forth have been recognized as a powerful tool for the detection of diseases in recent times. Various supervised or unsupervised algorithms are proposed to analyze biomedical images for purposes like segmentation of an organ, identification of disease affected area, classification of images, and so forth [1–3]. Following subsections summarizes various algorithms used for biomedical image classification.
1.1. MRI Related Work
Chaplot et al.  used Daubechies wavelet transform to extract the features of an MR image for patients suffering from brain tumor. Then self-organizing map and support vector machines have been used for classification between images of patients suffering from tumor and images of patients not suffering from tumor. Saravanan and Ramachandran  further extended this approach to use the Daubechies wavelet from level db1 to db15 and the wavelet having highest potential was selected. The coefficients extracted for that wavelet component are used for classification using backpropagation algorithm.
White mater hyperintensities in brain are commonly observed disorders found in ageing people. Griffanti et al.  proposed a method where correlation amongst images was identified using mean and standard deviation between various features like cognition, tissue microstructure, and so forth. This correlation was able to diagnose similar MR images with white matter disturbances. Ramakrishnan and Sankaragomathi  used SVM along with sequential minimal optimization (SMO) and modified region growing (MRG) with grey wolf optimization (GWO). The features fed into these two systems were grey level cooccurrence matrix, maximum intensity, and local Gabor XOR pattern. The proposed framework was compared with other similar techniques on the basis of accuracy and was claimed to be better. Also Nayak et al. in  diagnosed pathological brain by using fifty largest coefficients from level-5 discrete curvelet transform and then reducing the feature vector using PPCA. SVM is then used to classify between healthy and pathological brain. Authors in  proposed a system that employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images. Subsequently a two-dimensional wavelet transform is applied and correlated features are extracted using symmetric uncertainty ranking based filter. Zhang et al.  used stationary wavelet entropy to extract features from MR images and then a neural network feed-forward classifier is employed to classify between images of healthy people and patients suffering from hearing loss. Also authors in  proposed a scheme to identify pathological brain by using a simplified pulse-coupled neural network (SPCNN) for the region of interest (ROI) segmentation and fast discrete curvelet transform (FDCT) for feature extraction. Then PCA and linear discriminant analysis (LDA) are used to reduce feature and then probabilistic neural networks classified the images. The system achieved an accuracy up to 99.5%.
Maitra and Chatterjee  used slantlet transform to extract features. Slantlet transform is an extension of discrete wavelet transform (DWT) where the support of discrete time based functions is minimized. A number of features thereby extracted were kept to six which were then fed into a feed-forward artificial neural network for further classification. The results thereby obtained had 100 percent accuracy as compared to other DWT based algorithm for classification. Nayak et al.  further extracted the 2D wavelet components of an MR image and reduced them using probabilistic principal component analysis (PPCA). Finally, with the reduced feature set of thirteen, authors used AdaBoost random forest classifier and claimed 100 percent accuracy. Zhang et al.  applied level 3 decomposition via Haar wavelet transform to obtain the features and then applied principal component analysis (PCA) to reduce the features. Further backpropagation algorithm was used to classify the images as normal or diseased. The image dataset used was T2-weighted MR brain images from Harvard University. Sauwen et al.  compared various unsupervised classification techniques for brain tumor segmentation using their own two different datasets. Chen et al.  first defined a cluster center and then used simple extenics based correlation function to identify the relation between features and remove the redundant ones. Further particle swarm optimization was applied to classify the images. The accuracy and error rates were compared to similar algorithms and the proposed one was found to be superior.
Termenon  used extreme learning machines to extract features from MR images and applied majority vote classification to classify them. Cabria and Gondra  fused the segmented parts of a brain MR image to detect brain tumor in it. The fusion is achieved using intersection and union methods. Then AdaBoost with SVM is applied for classification.
1.2. CT Scan or Other Biomedical Images Related Work
Sudarshan et al.  presented a review work to understand the application of wavelets in detection of different types of cancer and they used ultrasound images to measure the performance. They compared the wavelet analysis of ultrasound images mainly. Authors in  compared various artificial neural network based classifiers which can be used for classification and clustering in biomedical images. Im and Park  also proposed a feature based classifier using ANN. The algorithm was tested for accuracy on a voting database and Monks problem for classification purpose. Polat et al.  used fuzzy based algorithm for classification of breast cancer and liver disorders. He normalized input data and then obtained artificial recognition balls (ARBs) for them. El-Dahshan et al.  used DWT to extract the components which were reduced by first computing a covariance matrix. This covariance matrix composed of eigenvalues is then rearranged in ascending order and feature vector is thereby selected out of it. This method is also known as PCA. KNN and ANN both were then applied to classify the images. Saritha et al.  extracted the DWT components based on Daubechies wavelet up to 8 levels. The extracted features are arranged on a spider web plot. The area components under the edges of these spider web plots are fed into a probabilistic neural network for further classification. Trigui et al.  classified CT images suffering from prostate cancer from the regular ones by extracting the spectrum signal based information which was analyzed to retrieve the choline and citrate levels in the prostate glands. A global feature vector was constructed by combining these two feature vectors and a supervised learning algorithm; namely, SVM was then applied for classification.
A lot of wavelet based techniques have been used by researchers for classification of biomedical images. By now, the various wavelet decomposition based approaches did not consider the feature set extracted by concatenating histograms of five different images obtained by wavelet decomposition of a biomedical image up to five levels. The proposed work extends the wavelet based pedagogy for the classification of biomedical images. It decomposes an image up to five levels using two-dimensional wavelet decomposition. Wavelet transforms have proven to be an efficient way of extracting information from images and less complex as compared to techniques like DWCT, curvelet transform, and so forth and thus are used here. Approximation coefficient matrix at each level is selected and its corresponding histogram is generated. The five histograms thereby obtained are concatenated to form a feature vector. The dimensionality of this feature vector is further reduced by using probabilistic PCA. The feature vector obtained with reduced dimension is used for classification purpose by either KNN or ANN. This approach is found to be more robust as compared to other approaches as discussed in Section 3.
The rest of the paper is organized as follows: Section 2 describes the methodology used for classification. It describes various steps of the proposed algorithm in detail. Section 3 discusses the results that are obtained when the proposed work is compared with algorithms in [23, 24, 26]. In Section 4, conclusion and possible future work are discussed.
The proposed algorithm consists of the following steps.
2.1. Two-Dimensional Discrete Wavelet Decomposition
The discrete wavelet transform (DWT) of a one-dimensional signal can be calculated by passing it through a high pass and a low pass filter simultaneously. If a low pass filter has impulse response then DWT can be evaluated by calculating the convolution of original signal with the impulse response asHere indicates complex conjugate . The signal is simultaneously decomposed with a high pass filter.
The wavelet decomposition is done using Daubechies-4 wavelet technique. The high pass and low pass filters used are given in (2) and (3), respectively (where and defines wavelet sequences for high and low filters used for convolution) .To compute DWT for a two-dimensional image, the original image is convolved along and directions by low pass and high pass filters as shown in Figure 1. The images obtained are downsampled by columns indicated by 2. Downsampled columns means only even indexed columns are selected [27, 29]. The resultant images are then convolved again with high pass and low pass filters. These images are now downsampled by rows denoted by 1 which ultimately yields four subband images of half the size of original image. Thus the four subband images generated are , , , and . , , and contain the horizontal, vertical, and diagonal information of the image. is the approximation coefficient and contains the maximum information of the image. is selected for the next round of decomposition in the same manner as that of the original image. From the next round also, approximation coefficient, that is, , is extracted. Similarly the image is decomposed by two-dimensional wavelet decomposition up to five levels. The approximation coefficients obtained, that is, , , , , and , are then used to form the feature set as demonstrated in the following subsections. Daubechies wavelet has two vanishing moments and thus it extracts better features as compared to simpler wavelets like Haar and achieves similar results as compared to complex wavelets like Gabor. Also it takes lesser time to retrieve results as compared to complex wavelet techniques and thus become a suitable choice for us to retrieve images . The decomposition up to five levels is done since at sixth level the image lost most of its details. Also through experimental results, it has been validated that the accuracy for classification was less at 4th-level decomposition and also found to be decreasing on sixth level decomposition.
2.2. Feature Extraction and Dimensionality Reduction Using Probabilistic PCA
The histogram of five approximation coefficient matrices, that is, , , , , and , is computed. To compute the histogram, we consider 256 equally spaced bins and calculate the number of pixels that belongs to each bin. Thus even if the image sizes are different, we get a histogram of size for all the images. Thereby the five histograms thus are vectors of size each. These five histograms are concatenated and the concatenated matrix is a feature vector of size .
The feature set is thereby reduced by applying probabilistic principal component analysis (PPCA). Principal component analysis reduces a given set of dimensions into lower dimension space. In PPCA the concept of associated likelihood function is used. It extracts a dimensional vector from a -dimensional vector variable by the relationship as given inwhere is the row vector of observed variable, stands for multiplication, is the row vector of latent variables, and is the isotropic error term . The -by- weight matrix relates the latent and observation variables, and the vector permits the model to have a nonzero mean. In our case is a vector of size 256 × 5. is a predefined weight matrix. Thus comes out to be a vector of size 5 × 1. This step reduces the feature set of values into a feature set of just 5 values for each image. As suggested by authors in , PPCA prevents overfitting of data during classification, particularly for images. Moreover, it helps in modelling data to higher dimensions with relatively few parameters and hence PPCA has been chosen for dimensionality reduction over regular PCA in this paper. The set of features was reduced up to a single dimensional vector of size 256 × 1 which took minimum time to classify images, when compared with a feature set of two or more columns and same number of rows (obtained if we use PPCA to reduce the feature set to obtain a matrix of size , , and so on). However there was negligible change in accuracy. Thereby PPCA is used to reduce the feature set up to a single column, that is, vector.
2.3. Classification of MR Images
The feature set obtained is then fed into a classifier. Two different classifiers, nearest neighbor (KNN) classifier and ANN classifier, have been used for performance measurement. Support vector machines (SVM) can classify between two classes and thus are not used here. A brief overview of these classifiers is presented below.
2.3.1. Nearest Neighbor Classifier
In this method, we classify the given input image into one of the closest training vectors. The -nearest neighbor classifier is a nonparametric supervised classifier which performs better when optimal values of are chosen. Supervised learning is used for training this classifier. In the training phase a given feature vector is mapped to one predefined class out of four classes given as given in Section 3 to form a classifier. During the testing phase, classification of any feature vector is done by determining the lowest Euclidean distance to one of the four classes of biomedical images .
2.3.2. Feed-Forward Artificial Neural Network Classifier
A feed-forward artificial neural network with one hidden layer has been used as a classifier. The hidden layer has 10 neurons. Output layer has four neurons to classify between four classes. The weights are initialized randomly and supervised learning is used to train the network and the weights are updated to map a given feature vector into a corresponding known class. After the training phase is over, matching on the testing dataset is performed and performance accuracy is measured .
2.4. Proposed Algorithm
The block diagram of the proposed feature extraction method for classification of MR images is shown in Figure 2.
We consider Read_image() as a function to read an image of a given format, two_dimensional_wavelet_decomp() as a function to compute two-dimensional wavelet decomposition of the input image, and Histogram() as a function which computes histogram of the input image. Also Probabilistic_principal_component _analysis reduces a matrix of dimension to a matrix of dimension . Then the proposed algorithm can be summarized in Algorithm 1.
Novelty of the proposed work lies in the fact that proposed work decomposes a given image using Daubechies-4 wavelet decomposition up to five levels. No such algorithm exists where a feature vector is formed by concatenated histograms of five decomposed images. Moreover PPCA is applied to reduce the size of feature vector, yet maintaining the information for classification. Thereby a highly informative feature vector is designed of comparatively smaller length.
3. Experimental Results and Performance Analysis
All simulation work has been carried out on a computational device with 4 GB RAM, 2.0 GHz processor, and MATLAB version 2017a on Windows platform. The results are obtained for two datasets, one of CT Scan images and another of MR Images. The first data has CT Scan images of around 39 subjects (9 never-smokers, 10 smokers, and 20 smokers with chronic obstructive pulmonary disease (COPD or emphysema)). The images are from upper, lower, and middle part of lungs. A total of 115 slices were used for the training purpose. For calculation of accuracy there are four classes designed: normal tissue (NT), centrilobular emphysema (CLE), paraseptal emphysema (PSE), and panlobular emphysema (PLE). The different categories represent one class of emphysema where each class also corresponds to the extent of emphysema. Thereby NT represents no emphysema, CLE indicates lesser emphysema, and PSE/PLE indicates higher extents of emphysema. [26, 32, 33]. Figure 3 shows CT Scan images in dataset 1 as obtained from  for top view of lungs.
The second data consists of MR images for heart . The data considered 49 subjects which had four sets of patients: heart failure with infarcts, heart failure without infarct, patients suffering from hypertrophy, and normal patients without any heart disease as shown in Figure 4.
The configuration of two datasets is given in Table 1.
The superiority of this algorithm over others is demonstrated in Figures 5 and 6. Figure 5(a) indicates the middle view of lungs for a patient suffering with CLE and Figure 6(a) indicates the middle view of lungs for a patient suffering from PSE. When we apply only one level wavelet decomposition to these images we obtain Figures 5(b) and 6(b), respectively. As we can observe that Figures 5(b) and 6(b) are very similar, thus it becomes evident that it is difficult to distinguish between two figures correspondent to different classes of emphysema by a single level decomposition. However successive images, that is, Figures 5(c)–5(f) are images obtained by wavelet decomposition of Figure 5(b) to one more level till we obtain fifth level decomposition for Figure 5(a). Similarly if we decompose Figure 6(b) by one level at a time for one figure we obtain Figures 6(c)–6(f), and thus Figure 6(f) corresponds to fifth level wavelet decomposition of Figure 6(a).
If we now compare Figures 5(f) and 6(f), we find a significant difference between the two images. Thus at each level the difference between images is increasing. Thereby when we concatenate all the histograms of these images, we obtain a more accurate feature vector as against feature vector obtained by single level decomposition.
The performance of the proposed feature set is compared with three similar classification algorithms in terms of accuracy which is defined in The performance for proposed algorithm (Prop. Algo.) with El-Dahshan algorithm , Saritha et al. algorithm , and Sorensen et al. algorithm  is summarized in multiple tables. The comparisons are made for different ratio of training and testing dataset. All results are in percentages. For dataset 1, in the case of 75–25 percent ratio, 86 images are used for training and 29 images for testing. In 80–20 percent ratio, 92 images are for training and 23 images are used for testing whereas in 85–15 percent ratio, 98 images are for training and 17 images are used for testing. For dataset 2, in the case of 75–25 percent ratio, 135 images are used for training and 45 images for testing. In 80–20 percent ratio, 144 images are used for training and 36 images are used for testing whereas in 85–15 percent ratio 153 images are used for training and 27 images are used for testing. The four algorithms are compared for two classifiers, that is, KNN and ANN for 75–25 percent ratio of training and testing data, 80–20 percent ratio of training and testing data, and 85–15 percent ratio of training and testing data in Tables 2, 3, and 4, respectively. The comparisons in these tables are made for both datasets.
As we can see that in all the cases the proposed algorithm performs better in terms of accuracy from the other three algorithms, even if the percentage of training and testing set is varied or the type of classifier used is varied, the proposed algorithm yields better results. Also the comparison of four algorithms in terms of running time (seconds) is summarized in Table 5.
As we can see in Table 5, the proposed algorithm takes much lesser time as compared to , even though  gave equal amount of accuracy in few cases, [24, 26] takes less time to execute but are much inferior in terms of accuracy if compared to the proposed algorithm in almost all the cases.
4. Conclusion and Future Scope
In this paper, a multilevel wavelet transform based feature matrix has been proposed for classification of CT Scan images and MR images. The feature set is extracted using histogram concatenation of images obtained by decomposing the original image through wavelet transform up to five levels. Extracted feature set is used with two classifiers, that is, KNN and ANN. The feature set is giving better results for both the classifiers and thus it can be claimed that the proposed feature vector is robust. For the proposed method, an increased accuracy of 3.3%, 12.75%, and 13.75% using KNN is achieved with respect to technique of , , and , respectively. Similarly, an increased accuracy of 6.22%, 13.9%, and 14.1% is achieved using ANN with respect to technique of [23, 24, 26], respectively. Thus it can also be claimed that the proposed feature set is more effective in terms of accuracy for multiple classifiers when compared to other three algorithms.
As a future work, the proposed feature set can also be tested using other classifiers like random forest, deep neural networks, and so forth and with different medical images.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
N. Saravanan and K. I. Ramachandran, “Incipient gear box fault diagnosis using discrete wavelet transform (DWT) for feature extraction and classification using artificial neural network (ANN),” Expert Systems with Applications, vol. 37, no. 6, pp. 4168–4181, 2010.View at: Publisher Site | Google Scholar
T. Ramakrishnan and B. Sankaragomathi, A Professional Estimate on the Computed Tomography Brain Tumor Images using SVM-SMO for Classification and MRG-GWO for Segmentation, Pattern Recognition Letters, Elsevier, 2017.View at: Publisher Site
R.-M. Chen, S.-C. Yang, and C.-M. Wang, MRI brain tissue classification using unsupervised optimized extenics-based methods, computers & electrical engineering, Elsevier, 2017.View at: Publisher Site
K. Polat, S. Şahan, H. Kodaz, and S. Güneş, “Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism,” Expert Systems with Applications, vol. 32, no. 1, pp. 172–183, 2007.View at: Publisher Site | Google Scholar
S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1999.View at: MathSciNet
D. Gupta and S. Choubey, “Discrete wavelet transform for image processing,” International Journal of Emerging Technology and Advanced Engineering, vol. 4, no. 3, pp. 598–602, 2015.View at: Google Scholar
N. Sverzellati, D. A. Lynch, M. Pistolesi et al., “Physiologic and quantitative computed tomography differences between centrilobular and panlobular emphysema in COPD,” Chronic Obstructive Pulmonary Diseases: Journal of the COPD Foundation, vol. 1, no. 1, pp. 125–132, 2014.View at: Publisher Site | Google Scholar