Abstract
Accurate diagnosis of pathological brain images is important for patient care, particularly in the early phase of the disease. Although numerous studies have used machinelearning techniques for the computeraided diagnosis (CAD) of pathological brain, previous methods encountered challenges in terms of the diagnostic efficiency owing to deficiencies in the choice of proper filtering techniques, neuroimaging biomarkers, and limited learning models. Magnetic resonance imaging (MRI) is capable of providing enhanced information regarding the soft tissues, and therefore MR images are included in the proposed approach. In this study, we propose a new model that includes Wiener filtering for noise reduction, 2Ddiscrete wavelet transform (2DDWT) for feature extraction, probabilistic principal component analysis (PPCA) for dimensionality reduction, and a random subspace ensemble (RSE) classifier along with the nearest neighbors (KNN) algorithm as a base classifier to classify brain images as pathological or normal ones. The proposed methods provide a significant improvement in classification results when compared to other studies. Based on crossvalidation (CV), the proposed method outperforms 21 stateoftheart algorithms in terms of classification accuracy, sensitivity, and specificity for all four datasets used in the study.
1. Introduction
Magnetic resonance imaging (MRI) of the brain provides comprehensive diagnostic information for diagnosis [1]. It is essential because it is noninvasive and safe and yields a higher resolution that cannot be obtained by other techniques. MRI is mainly utilized to diagnose different types of disorders such as strokes, tumors, bleeding, injury, bloodvessel diseases or infections, and multiple sclerosis (MS). The early diagnosis of pathological brain disease and its prodromal stage are critical and can decrease or halt the progression of the disease [2]. Therefore, the classification of normal/pathological brain status from MRIs is essential in clinical medicine as it focuses on soft tissue anatomy and generates a large and detailed dataset about the subject’s brain. However, the use of a large database makes manual interpretation of the brain images tedious, time consuming, and costly. The major drawback of the manual approach is its irreducibility. Therefore, there is a need for automated image analysis tools such as computeraided diagnosis (CAD) systems [3].
Considerable research has been carried out to develop automatic tools for the classification of MR images to distinguish between normal and pathological brains. ElDahshan et al. [4] utilized a threelevel discrete wavelet transform, accompanied by principal component analysis (PCA), to decrease features. A good success rate was obtained by using feedforward backpropagation neural networks (BPNNs) and the nearest neighbor (KNN). Zhang and Wu [5] recommended the application of a kernel support vector machine (KSVM) and presented three new kernels: homogenous polynomial, inhomogeneous polynomial, and Gaussian radial basis for distinguishing between normal and abnormal images. Patnaik et al. [6] employed DWT to obtain the approximation coefficients. Later, a support vector machine (SVM) was utilized to perform the classification. Zhang et al. [7] recommended a training feedforward neural network (FNN) with a unique scaled conjugate gradient (SCG) technique. Kundu et al. [8] proposed combining the Ripplet transform (RT) for feature extraction, PCA for dimensionality reduction, and the leastsquare SVM (LSSVM) for classification, and the 5 × 5 stratified crossvalidation (SCV) offered high classification accuracies. ElDahshan et al. [9] utilized the feedback pulsecoupled neural network for the preprocessing of MR images, the DWT for feature extraction, PCA for features reduction, and the FBPNN for the classification of pathological and normal brains. Damodharan and Raghavan [10] used wavelet entropy as the feature space, and they then used the traditional naïveBayes classifier classification method. Wang et al. [11] utilized the stationary wavelet transform (SWT) to substitute for DWT. Likewise, they proposed a hybridization of particle swarm optimization (PSO) and the artificial bee colony (HPA) method to obtain the optimal weights and biases of FNN. Nazir et al. [12] applied denoising at the beginning, and they achieved an overall classification accuracy of 91.8%. Harikumar and Vinoth Kumar [13] used waveletenergy and SVM. Padma and Sukanesh [14] used the combined wavelet statistical feature to segment and classify Alzheimer’s disease (AD) as well as benign and malignant tumor slices. Zhang et al. [15] utilized Hu moment invariants (HMI) and generalized eigenvalue proximal SVM (GEPSVM) for the detection of pathological brain in MRI scanning and obtained an accuracy of 98.89%, sensitivity of 99.29%, and specificity of 92.00%. Later on, Zhang et al. [16] used multilayer perceptron (MLP) for classification, where two pruning techniques like dynamic pruning (DP) and Bayesian detection boundaries (BDB were used to find the optimal hidden neurons and an adaptive real coded BBO (ARCBBO) method was implemented to determine the optimal weights and obtained an accuracy of 98.12% and 98.24%, respectively. Nayak et al. [17] used 2DDWT, PCA, and Adaboost algorithm with random forest as its base classifier and obtained an accuracy of 98.44% for classification of pathological brain MR image with Dataset255. Later on, Nayak et al. [18] utilized twodimensional stationary wavelet transform (SWT), symmetric uncertainty ranking (SUR) filter, and Adaboost with SVM classifier for the detection of pathological brain MR images and obtained an accuracy of 98.43% with Dataset255. Wang et al. [19] employed Pseudo Zernike moment and linear regression classifier for classification of Alzheimer’s disease and yielded an accuracy of 97.51%, sensitivity of 96.71%, and specificity of 97.73%. Alam et al. [20] utilized dualtree complex wavelet transform (DTCWT), principal component analysis (PCA), and twin support vector machine (TSVM) for the detection of Alzheimer’s disease classification and obtained an accuracy of .
Scholars have proposed different methods to extract features for the pathological brain disease [21]. After analyzing the above methods, we found that all of the methods achieved promising results which indicated that 2DDWT is effective in feature extraction for pathological brain detection. However, there are two problems. (1) Most of them utilize traditional PCA for feature extraction which is computationalintensive for large datasets with a higher dimensions. (2) The classification performance can be further improved, because the feature vector contains excessive features, which required more memory and increased computational complexity. Moreover, it required too much time to train the classifiers.
To address the abovementioned problems, we proposed a new pathological brain detection system based on brain MR images which has the potential improvements over the other schemes. Weiner filter is used for the preprocessing of the images. The proposed method uses 2D DWT for the extraction of features because of its ability to analyze images at different scales. PPCA is used in place of PCA for the reduction of features which has the advantages of computing the efficient dimension reduction in terms of the distribution of latent variables, maximumlikelihood estimates, probability model, dealing with the missing data, and a combination of multiple PCA as probabilistic mixture. A relatively new classifier known as random subspace ensemble (RSE) classifier is employed which has the advantage of low computational burden over the traditional classifiers. Hence, the novelty of the proposed method lies in the application of PPCA features and RSE classifier.
The article is organized as follows: Section 2 presents details about the materials and methods. Section 3 describes the experimental results, evaluation procedure, and discussions. Finally, Section 4 presents the conclusion and future research.
2. Materials and Methods
2.1. Materials
At present, there are four benchmark datasets (DS) as DS66, DS90, DS160, and DS255, of different sizes of 66, 90, 160, and 255 images, respectively. All the datasets (DS) contain axial, T2weighted, 256 × 256pixel MR images downloaded from medical school of Harvard University (Boston, MA, USA) (URL: http://www.med.harvard.edu/aablib/home.html) website. T2weighted images are selected as input image because T2weighted (spinspin) relaxation gives better image contrast that is helpful to show different anatomical structure clearly. Also, they are better in detecting lesions than T1 weighted images.
We selected five slices from each subject. The selection criterion is that, for healthy subjects, these slices were selected at random. For pathological subjects, the slices should contain the lesions by confirmation of these radiologists with ten years of experiences. A sample of diseased slices is shown in Figure 2. In this investigation, all diseases are treated as pathological, and our task is a binary classification problem, that is, to distinguish pathological brain from healthy brains. Here, the whole brain is considered as the input image. We did not select local characteristics like point and edge, and we extract global image characteristics that are further learned by the new cascade model. Let us keep in mind that our procedure is different from the way neuroradiologists do. They usually take the local features and compare with standard template to check whether focuses exist, such as shrink, expansion, bleeding, and inflammation. While our technique is like AlphaGO, the computer researcher gives the machine sufficient data, and then the machine can learn how to make classification naturally. Including patients’ information (age, gender, handedness, memory test, education, etc.) can add additional information and thus may assist us to improve the classification performance. Nevertheless, this new model proposed in our research is only dependent on the imaging data. Besides, the imaging data from the website does not contain the subjects’ information.
The cost of predicting pathological to normal types is severe, because the subjects may be told that she/he is normal and thus avoids the mild symptoms displayed. The treatments of patients may be postponed. Nevertheless, the cost of misclassification of healthy to pathological types is low, since correct treatment can be given by other diagnosis means. The costsensitivity (CS) problem was resolved by changing the class distribution at the beginning state, since original data was accessible. That means we purposely picked up more pathological brains than healthy ones into the dataset, with the goal of making the classifier biased to pathological brains, to solve the CS problem. The overfitting problem was supervised by crossvalidation technique.
In our experiment, DS66 and DS160 are extensively employed for brain MR image classifications that consist of normal brain images as well as abnormal brain images from seven types of diseases, namely, glioma, meningioma, Alzheimer’s disease, Alzheimer’s disease plus visual agnosia, Pick’s disease, sarcoma, and Huntington’s disease. DS90 contains MR brain images of a healthy brain, AIDS dementia, Alzheimer’s disease plus visual agnosia, Alzheimer’s disease, cerebral calcinosis, cerebral toxoplasmosis, CreutzfeldtJakob disease, glioma, herpes encephalitis, Huntington’s disease, Lyme encephalopathy, meningioma, metastatic adenocarcinoma, metastatic bronchogenic carcinoma, motor neuron disease, MS, Pick’s disease, and sarcoma.
The third dataset, DS255, includes images of four new types of diseases embedded with the above seven types of diseased images and normal brain images. The four additional diseases are chronic subdural hematoma, cerebral toxoplasmosis, herpes encephalitis, and MS.
2.2. Proposed Methodology
The proposed method comprises four vital stages, namely, image preprocessing, feature extraction using 2DDWT, feature reduction utilizing PPCA, and classification using the RSE classifier. In order to enhance the quality of the MR images, Wiener filter is employed, followed by the extraction of approximation coefficients from MR images utilizing a 2DDWT with threelevel decomposition. Then, we saved these obtained features as our primary features. Thereafter, then we employ PPCA for obtaining uncorrelated discriminant set of features. Finally, we classified the reduced features using the RSE classifier with KNN as a base classifier. The complete block diagram of the proposed system is shown in Figure 1. A brief description about all these four stages is shown below.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
2.2.1. Preprocessing Using Wiener Filter
The gif images were downloaded individually from the website of the Harvard Medical School. Then, each of the gif images was converted into JPG format manually. The images were in RGB format, and they were then converted into grayscale intensity images. Next, the intensity image is converted to double precision. Acquired brain MR images require preprocessing to improve the quality, enabling us to obtain better features. In our study, we used the popular Wiener filter method.
The Wiener filter is used to replace the finite impulse response (FIR) filter in order to decrease noise in signals [22]. When an image is blurred by a familiar lowpass filter (LPF), we can recover the image by inverse filtering. However, inverse filtering is extremely sensitive to additive noise. Wiener filtering accomplishes an optimal tradeoff between inverse filtering and noise smoothing in that it eliminates the additive noise and inverts the blurring simultaneously. In addition, it reduces the overall meansquare error during the course of inverse filtering plus noise smoothing. The Wiener filtering method generates a linear approximation of the original image and is based on the stochastic framework. The orthogonality principle indicates that the Wiener filter in the Fourier domain can be articulated as follows:
Here, is the power spectrum of the original image, is the adaptive noise, and is the blurring filter.
2.3. 2DDWT
2.3.1. Advantage of Wavelet Transform
The FT is the most commonly used tool for the analysis of signals, and it breaks down a timedomain signal into constituent sinusoids of various frequencies, thus changing the signal from the time domain to the frequency domain. Nevertheless, the FT has a serious disadvantage as it removes the time information from the signal. For instance, an investigator cannot determine when a specific event took place based on a Fourier spectrum. Therefore, the classification accuracy decreases as the time information is lost.
Gabor modified the FT to examine only a small part of the signal at a time. This approach is known as windowing or the shorttime FT (STFT) [23]. It accumulates a window of appropriate shape to the signal. STFT can be considered as a compromise between the time information and frequency information. Nevertheless, the precision of the information is limited by the window size.
The wavelet transform (WT) constitutes the next logical step. It uses a windowing method with variable size, and the progress of the signal analysis is shown in Figure 3. Another benefit of the WT is that it selects a “scale” in place of the traditional “frequency”; that is, it does not generate a timefrequency view of a specific signal but a timescale view. The timescale view is another way of visualizing data and is more commonly used and effective.
2.3.2. DWT
This is an effective implementation of the WT, and it utilizes the dyadic scales and positions [24]. The fundamentals of the DWT are as follows. Let be a squareintegral function. The continuous WT of the signal relative to a realvalued wavelet is defined aswhere is the WT, indicates the function across , and the variable is the dilation factor (both real and positive numbers). Here, the asterisk () indicates the complex conjugate.
Equation (1) can be discretized by restraining and to a discrete lattice ( and ) to provide the DWT, which is given as follows:
Here, and refer to the coefficients of the approximation components and detailed components, respectively. and represent the LPF and highpass filter (HPF), respectively. and represent the wavelet scale and translation factors, respectively. The DS operator represents downsampling. The approximation component has lowfrequency components of the image, whereas the detailed components contain highfrequency components. Figure 4 shows a threelevel decomposition tree.
2.3.3. 2DDWT
In a case involving 2D images, the DWT is employed in each dimension separately. A sample of a pathological brain MR image with its threelevel wavelet decomposition is shown in Figure 5. Consequently, there are four subband images (LL, LH, HH, and HL) at each scale. The subband LL is utilized for the other 2DDWT and can be considered as the approximation component of the image, whereas the LH, HL, and HH subbands can be considered as the detailed components of the image. As the level of the decomposition is increased, a more compact, but coarser approximation component is accessed. Thus, wavelets give a simple hierarchical foundation for clarifying the image information.
There are various types of wavelets, for example, Daubechies, symlets 1, coiflets 1, and biorthogonal wavelets and reverse biorthogonal 1.1. We tested our result with each type of the wavelet family as shown in Table 2. In our research, the approximation coefficient of threelevel wavelet decomposition along with a Haar wavelet yields promising results when compared to others in the wavelet family. Hence, Haar wavelet was selected in the experiment. It is also the simplest and most significant wavelet of the wavelet family. Moreover, it is very fast and can be used to extract basic structural information from an image. All the features are present for all the images, and a feature matrix is generated.
2.4. Probabilistic Principal Component Analysis
The PPCA algorithm proposed by Tipping et al. [36–38] is based on the estimation of the principal axes when any input vector has one or more missing values. The PPCA reduces the highdimensional data to a lowerdimensional representation by relating a dimensional observation vector to a kdimensional latent (or unobserved) variable that is regarded as normal with zero mean and covariance . Moreover, PPCA depends on an isotropic error model. The relationship can be established as where denotes the row vector of the observed variable, denotes the isotropic error term, and is the row vector of latent variables. The error term, , is Gaussian with zero mean and covariance , where is the residual variance. To make the residual variance greater than 0, the value of should be smaller than the rank. A standard principal component where equals 0 is the limiting condition of PPCA. The observed variables, y, are conditionally independent for the given values of the latent variables . Therefore, the correlation between the observation variables is explained by the latent variables, and the error justifies the variability unique to . The dimension of the matrix is , and it relates both latent and observation variables. The vector allows the model to acquire a nonzero mean. PPCA considers the values to be missing and arbitrary over the dataset. From this model,
Given that the solution of and cannot be determined analytically, we used the expectationmaximization (EM) algorithm for the iterative maximization of the corresponding loglikelihood function. The EM algorithm considers missing values as additional latent variables. At convergence, the columns of span the solution subspace. PPCA then yields the orthonormal coefficients.
With respect to our research, the size of the image is 256 × 256. After threelevel decomposition, the vector feature becomes 32 × 32 = 1024. Here, all the features are not relevant for the classification. Because of the high computational cost, we utilized PPCA for the dimensionality reduction. The advantage of PPCA over PCA is its computational efficiency.
2.5. RSE Classifier
Ensemble classification includes combining multiple classifiers to obtain more accurate predictions than those obtained utilizing individual models. In addition, ensemble learning techniques are considered very useful for upgrading prediction accuracy. Nevertheless, base classifiers must be as precise and diverse as possible to increase the generalization capability of an ensemble model.
For the classification of normal and pathological brain MRI images, we used a random subspace classifier that uses KNN as a base classifier. The main idea behind the success of ensemble classification is the diversification in the classification that makes the ensemble classifier. With the ensemble classification approach, each classifier provides a different error for different instant. Therefore, we can develop a strong classifier that can decrease the error. The random subspace classifier is a machinelearning classifier that divides the entire feature space into subspaces. Each subspace randomly selects features from the original feature space. It must be guaranteed that the boundaries of the particular base classifier are significantly different. To realize this, an unstable or weaker classifier is utilized as base classifier because they create sufficiently varied decision boundaries, even for small disturbances in the training data parameters.
We used the majority voting method to obtain the final decision of the class membership. In the proposed algorithm, we used KNN as the base classifier owing to its simplicity. After selecting a random subspace, a new set of KNNs is estimated. The majority voting method was utilized to combine the output of each base classifier for the decision preparing test class.
2.6. Pseudocode of Proposed System
Our proposed system can be outlined in four major stages. The steps involved are depicted in Pseudocode 1.

2.7. Performance Measures
Various techniques are used to evaluate the classifier’s efficiency. The performance is determined based on the final confusion matrix. The confusion matrix holds correct and incorrect classification results. Table 1 illustrates a confusion matrix for binary classification, where TP, TN, FP, and FN depict true positive, true negative, false positive, and false negative, respectively.
Here, pathological brains are assumed to hold the value “true,” and normal control (NC) ones are assumed to hold the value “false” following normal convention. Now, we calculate the performance of the proposed approach on the basis of sensitivity, specificity, accuracy, and precision as follows.
(i) Sensitivity (true positive rate): this is the tendency or ability to determine that the diagnostic test is positive when the person has the disease:
(ii) Specificity (true negative rate): this is the tendency or ability to determine that the diagnostic test is negative when the person does not have the disease:
(iii) Accuracy: this is a measure of how many diagnostic tests are correctly performed:
(iv) The precision and the recall are formulated by
2.8. CrossValidation
Crossvalidation (CV) is a modelassessment method that is used to evaluate the performance of a machinelearning algorithm prediction on a new DS on which it has not been trained. It helps to solve the overfitting problems. Each crossvalidation round involves randomly portioning the original DS into a training set and a validation set. The illustration of the fold CV is shown in Figure 6. The training set is used to train a supervised learning algorithm, while a test set is used to evaluate its performance.
To make the RSE classifier more reliable and generalize to independent datasets, a 5 × 6fold stratified crossvalidation (SCV) and 5 × 5fold SCV are employed. A 5 × 6fold SCV is employed for DS66 and 5 × 5fold SCV is used for DS90, DS160, and DS255. For DS66, 55 MR images are used for training whereas 75, 128, and 204 images are used for DS90, DS160, and DS255 respectively. The validation images for DS66, DS90, DS160, and DS255 are 11, 15, 32, and 51, respectively.
3. Results and Discussion
In this study, we implemented a new machinelearning framework using MATLAB 2016a on an Intel computer with a Corei5 processor and 16 GB RAM running under the Windows 7 operating system. This program can be tested or run on any computer platform where MATLAB is available.
3.1. Feature Extraction and Optimum Wavelet
In the proposed system, the threelevel 2DDWT of the Haar wavelet breaks down the input image into 10 subbands, as illustrated in Figure 5. The top left corner of the wavelet coefficient image (Figure 5) represents the approximation coefficients of the threelevel decomposition of the image, whose size is only 32 × 32 = 1024. These obtained features are the initial features. The size of these features is still large, and the matrix size needs to be reduced. Now, these reduced features are sent as the input to the PPCA.
3.2. Feature Reduction
The use of PPCA as a dimensionreduction tool reduces the feature size to its desired size. Here, we can take the feature as desired. It is better that the desired number of features should at least preserve more than 90% of the variance. However, in this study, we did not take 95% of the variance because it may lead to a higher computational cost. Researchers have considered different numbers of features. In our case, we first used a small number of features, but the accuracy was poor. However, the result with 13 principal components was excellent. Hence, the proposed method uses 13 principal components to earn higher classification accuracy.
3.3. Classification Results
The reduced features were sent to the classifier, and the results obtained with the different classifier are promising. From the experiment, it is seen that the proposed method works well for all four DSs using 13 principal components. The performances obtained with logistic regression, quadratic discriminant analysis, KNN, and RSE classifier with KNN as a base classifier are shown in Table 3. From the table, we see that the proposed method outperforms other methods. We utilized a 5fold CV for DS90, DS160, and DS255, whereas we utilized a 6fold CV for DS66. The RSE classifier obtained an accuracy of 100.00%, 100.00%, 100.00%, and 99.20%, with DS66, DS90, DS160, and DS255, respectively. The result obtained with the cubic SVM is the same as the RSE classifier for the dataset beside DS66, where it could only achieve 98.50%.
3.4. Comparison with Existing Schemes
To further demonstrate the effectiveness of the proposed approach, we compared 21 existing algorithms. The algorithms and their corresponding results are listed in Tables 4 and 5. Table 4 shows the comparison result with DS90. It is evident from Table 4 that our proposed method correctly matched all cases with 100% sensitivity, 100% specificity, 100% precision, and 100% accuracy. A comparison of the obtained results shows that our algorithm is superior to the others. This shows the effectiveness of the preprocessing technique combined with features extracted using the WT and PPCA. Table 4 shows the result of 5 runs of the proposed system. Table 5 demonstrates the comparison results over the three DSs in terms of the number of features, number of runs, and average accuracy. Here, some of the recent schemes were run 10 times, while others were run five times. From Tables 4 and 5, we see that most of the techniques achieved excellent classification when subjected to DS66 as it is smaller in size. However, none of the algorithms achieved 100.00% with DS90 and DS160 because DS255 is larger in size and includes more types of diseased brains; therefore, no current CAD system can earn a perfect classification.
Finally, this proposed “DWT + PPCA + RSE” achieved an accuracy of 100% for DS66, DS90, and DS160 and an accuracy of 99.20% for DS255, which is comparable with other recent studies and greater than the entire algorithm presented in Table 5. The improvement realized by the recommended scheme appears to be marginal compared with other schemes, but we obtained this result based on a careful statistical analysis (five repetitions of fold CV). Thus, this improvement is reliable and robust.
4. Conclusion
This paper proposed a new cascade model of “2DDWT + PPCA + RSE” for the detection of pathological brains. The experiments validated its effectiveness as it achieved an accuracy of 99.20%. Our contributions lie in three points. First, we introduced the Wiener filter and showed its effectiveness. Besides this we introduced the PPCA and RSE classifier and proved it gives the better performance when compared with other stateoftheart algorithms. In this work, we transformed the PBD problem to a binary classification task. We presented a novel method that replaced PCA and introduced RSE classifier. The experiment showed the superiority of our methods to existing approaches.
The proposed algorithm can also be employed in other fields, for example, face recognition, breast cancer detection, and fault detection. Moreover, this method has been validated on the publically available datasets which are limited in size. Also, in the selected dataset, the images are collected during the late and middle stage of diseases; however, the images with disease at early stages need to be considered.
In future research, we may consider images from other modalities like MRSI, PET, and CT to increase robustness to our scheme. The proposed method can be validated on a larger clinical dataset utilizing modern machinelearning techniques like deep learning, extreme learning, and so on, after collecting the enough brain images from the medical institutes. Internet of things can be another promising research field to embed this PBDS.
Nomenclature
MR(I):  Magnetic resonance (imaging) 
DWT:  Discrete wavelet transform 
PPCA:  Probabilistic principal component analysis 
KNN:  nearest neighbor 
CV:  Crossvalidation 
BPNN:  Backpropagation neural network 
KSVM:  Kernel support vector machine 
SCG:  Scale conjugate gradient 
LSSVM:  Leastsquare support vector machine 
FBPNN:  Feedforward backpropagation neural network 
SWT:  Stationary wavelet transform 
PSO:  Particle swarm optimization 
CAD:  Computeraided diagnosis 
STFT:  Shorttime Fourier transform 
QDA:  Quadratic discriminant analysis 
SUR:  Symmetric uncertainty ranking 
PZM:  Pseudo Zernike moment 
SWT:  Stationary wavelet transform 
DTCWT:  Dualtree complex wavelet transform 
RBFNN:  Radial basis function neural network 
CT:  Computed tomography 
TSVM:  Twin support vector machine 
HMI:  Hu moment invariants 
MLP:  Multilayer perceptron 
ARCBBO:  Adaptive real coded biogeographybased optimization 
DP:  Dynamic pruning. 
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the Brain Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT & Future Planning (NRF2014M3C7A1046050). And this work was supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF2017R1A2B4006533).