Abstract

Parkinson's disease (PD) is a common neurodegenerative disease, which has attracted more and more attention. Many artificial intelligence methods have been used for the diagnosis of PD. In this study, an enhanced fuzzy k-nearest neighbor (FKNN) method for the early detection of PD based upon vocal measurements was developed. The proposed method, an evolutionary instance-based learning approach termed CBFO-FKNN, was developed by coupling the chaotic bacterial foraging optimization with Gauss mutation (CBFO) approach with FKNN. The integration of the CBFO technique efficiently resolved the parameter tuning issues of the FKNN. The effectiveness of the proposed CBFO-FKNN was rigorously compared to those of the PD datasets in terms of classification accuracy, sensitivity, specificity, and AUC (area under the receiver operating characteristic curve). The simulation results indicated the proposed approach outperformed the other five FKNN models based on BFO, particle swarm optimization, Genetic algorithms, fruit fly optimization, and firefly algorithm, as well as three advanced machine learning methods including support vector machine (SVM), SVM with local learning-based feature selection, and kernel extreme learning machine in a 10-fold cross-validation scheme. The method presented in this paper has a very good prospect, which will bring great convenience to the clinicians to make a better decision in the clinical diagnosis.

1. Introduction

Parkinson's disease (PD), a degenerative disorder of the central nervous system, is the second most common neurodegenerative disease [1]. The number of people suffering from PD has increased rapidly worldwide [2], especially in developing countries in Asia [3]. Although its underlying cause is unknown, the symptoms associated with PD can be significantly alleviated if detected in the early stages of illness [46]. PD is characterized by tremors, rigidity, slowed movement, motor symptom asymmetry, and impaired posture [7, 8]. Research has shown phonation and speech disorders are also common among PD patients [9]. In fact, phonation and speech disorders can appear in PD patients as many as five years before being clinically diagnosed with the illness [10]. The voice disorders associated with PD include dysphonia, impairment in vocal fold vibration, and dysarthria, disability in correctly articulating speech phonemes [11, 12]. Little et al. [13] first attempted to identify PD patients with dysphonic indicators using a combination of support vector machines (SVM), efficient learning machines, and the feature selection approach. The study results indicated that the proposed method efficiently identified PD patients with only four dysphonic features.

Inspired by the results obtained by Little et al. [13], many other researchers conducted studies on the use of machine learning techniques to diagnose PD patients on the same dataset (hereafter Oxford dataset). In [14], Das made a comparison of classification score for diagnosis of PD between artificial neural networks (ANN), DMneural, and Regression and Decision Trees. The ANN classifier yielded the best results of 92.9%. In [15], AStröm et al. designed a parallel feed-forward neural network system and yielded an improvement of 8.4% on PD classification. In [16], Sakar et al. proposed a method that combined SVM and feature selection using mutual information to detect PD and obtained a classification accuracy of 92.75%. In [17], a PD detection method developed by Li et al. using an SVM and a fuzzy-based nonlinear transformation method yielded a maximum classification accuracy of 93.47%. In another study, Shahbaba et al. [18] compared the classification accuracies of a nonlinear model based on a combination of the Dirichlet processes, multinomial logit models, decision trees, and support vector machines, which yielded the highest classification score of 87.7%. In [19], Psorakis et al. put forward novel convergence methods and model improvements for multiclass mRVMs. The improved model achieved an accuracy of 89.47%. In [20], Guo et al. proposed a PD detection method with a maximum classification accuracy of 93.1% by combination of genetic programming and the expectation maximization algorithm (GP-EM). In [21], Luukka used a similarity classifier and a feature selection method using fuzzy entropy measures to detect PD, and a mean classification accuracy of 85.03% is achieved. In [22], Ozcift et al. presented rotation forest ensemble classifiers with feature selection using the correlation method to identify PD patients; the proposed model yielded a highest classification accuracy of 87.13%. In [23], Spadoto et al. used a combination of evolutionary-based techniques and the Optimum-Path Forest (OPF) classifier to detect PD with a maximum classification accuracy of 84.01%. In [24], Polat integrated fuzzy C-means clustering-based feature weighting (FCMFW) into a KNN classifier, which yielded a PD classification accuracy of 97.93%. In [25], Chen et al. combined a fuzzy k-nearest neighbor classifier (FKNN) with the principle component analysis (PCA-FKNN) method to detect PD; the proposed diagnostic system yielded a maximum classification accuracy of 96.07%. In [26], Zuo et al. developed an PSO-enhanced FKNN based PD diagnostic system with a mean classification accuracy of 97.47%. In [2729], Babu et al. proposed a ‘projection based learning meta-cognitive radial basis function network (PBL-McRBFN)’ approach for the prediction of PD, which obtained an testing accuracy of 96.87% on the gene expression data sets, 99.35% on standard vocal data sets, 84.36% on gait PD data sets, and 82.32% on magnetic resonance images. In [30], the hybrid intelligent system for PD detection was proposed which included several feature preprocessing methods and classification techniques using three supervised classifiers such as least-square SVM, probabilistic neural networks, and general regression neural network; the experimental results gives a maximum classification accuracy of 100% for the PD detection. Furthermore, in [31], Gök et al. developed a rotation forest ensemble KNN classifier with a classification accuracy of 98.46%. In [32], Shen et al. proposed an enhanced SVM based on fruit fly optimization algorithm, and have achieved 96.90% classification accuracy for diagnosis of PD. In [33], Peker designed a minimum redundancy maximum relevance (mRMR) feature selection algorithm with the complex-valued artificial neural network to diagnosis of PD, and obtained a classification accuracy of 98.12%. In [34], Chen et al. proposed an efficient hybrid kernel extreme learning machine with feature selection approach. The experimental results showed that the proposed method can achieve the highest classification accuracy of 96.47% and mean accuracy of 95.97% over 10 runs of 10-fold CV. In [35], Cai et al. have proposed an optimal support vector machine (SVM) based on bacterial foraging optimization (BFO) combined with the relief feature selection to predict PD, the experimental results have demonstrated that the proposed framework exhibited excellent classification performance with a superior classification accuracy of 97.42%.

Different from the work of Little et al., Sakar et al. [36] designed voice experiments with sustained vowels, words, and sentences from PD patients and controls. The paper reported that sustained vowels had more PD-discriminative power than the isolated words and short sentences. The study result achieved 77.5% accuracy by using SVM classifier. From then on, several works have been proposed to detect PD using this PD dataset (hereafter Istanbul dataset). Zhang et al. [37] proposed a PD classification algorithm that integrated a multi-edit-nearest-neighbor algorithm with an ensemble learning algorithm. The algorithm achieved higher classification accuracy and stability compared with the other algorithms. Abrol et al. [38] proposed a kernel sparse greedy dictionary algorithm for classification tasks, comparing with kernel K-singular value decomposition algorithm and kernel multilevel dictionary learning algorithm. The method achieved an average classification accuracy of 98.2% and the best accuracy of 99.4% on the Istanbul PD dataset with multiple types of sound recordings. In [39], the authors investigated six classification algorithms, including Adaboost, support vector machines, neural network with multilayer perceptron (MLP) structure, ensemble classifier, K-nearest neighbor, naive Bayes, and presented feature selection algorithms including LASSO, minimal redundancy maximal relevance, relief, and local learning-based feature selection on the Istanbul PD dataset. The paper indicated that applying feature selection methods greatly increased the accuracy of classification. The SVM and KNN classifiers with local learning-based feature selection obtained the optimum prediction ability and execution times.

As shown above, ANN and SVM have been extensively applied to the detection of PD. However, understanding the underlying decision-making processes of ANN and SVM is difficult due to their black-box characteristics. Compared to ANN and SVM, FKNN is much simpler and yield more easily interpretable results. FKNN [40, 41] classifiers, improved versions of traditional k-nearest neighbor (KNN) classifiers, have been studied extensively since first proposed for the use of diagnostic purposes. In recent years, many variant versions of KNNs based on fuzzy sets theory and several extensions have been developed, such as fuzzy rough sets, intuitionistic fuzzy sets, type 2 fuzzy sets, and possibilistic theory based KNN [42]. FKNN allows for the representation of imprecise knowledge via the introduction of fuzzy measures, providing a powerful method of similarity description among instances. In FKNN methods, fuzzy set theories are introduced into KNNs, which assign membership degrees to different classes instead of the distances to their k-nearest neighbors. Thus, each of the instances is assigned a class membership value rather than binary values. When it comes to the voting stage, the highest class membership function value is selected. Then based on these properties, FKNN has been applied to numerous practical problems, such as medical diagnosis problems [25, 43], protein identification and prediction problems [44, 45], bankruptcy prediction problems [46], slope collapse prediction problems [47], and grouting activity prediction problems [48].

The classification performance of an FKNN greatly relies on its tuning parameters, neighborhood size (k), and fuzzy strength (m). Therefore, the two parameters should be precisely determined before applying FKNN to practical problems. Several studies concerning parameter tuning in FKNN have been conducted. In [46], Chen et al. presented the particle swarm optimization (PSO) based method to automatically search for the two tuning parameters of an FKNN. According to the results of the study, the proposed method could be effectively and efficiently applied to bankruptcy prediction problems. More recently, Cheng et al. [48] developed a differential evolution optimization approach to determine the most appropriate tuning parameters of an FKNN and successfully applied to grouting activity prediction problems in the construction industry. Later, Cheng et al. [47] proposed using firefly algorithm to tune the hyperparameters of the FKNN model. The FKNN model was then applied to slop collapse prediction problems. The experiment results indicated that the developed method outperformed other common algorithms. The bacterial foraging optimization (BFO) method [49], a relatively new swarm-intelligence algorithm, mimics the cooperative foraging behavior of several bacteria on a multidimensional continuous search space and, therefore, effectively balances exploration and exploitation events. Since its introduction, BFO has been subtly introduced to real-world optimization problems [5055], such as optimal controller design problems [49], stock market index prediction problems [56], automatic circle detection problems involving digital images [57], harmonic estimation problems [58], active power filter design problems [59], and especially the parameter optimization of machine learning methods [6063]. In [60], BFO was introduced to wavelet neural network training and applied successfully to load forecasting. In [61], an improved BFO algorithm was proposed to fine-tune the parameters of fuzzy support vector machines to identify the fatigue status of the electromyography signal. The experimental results have shown that the proposed method is an effective tool for diagnosis of fatigue status. In [62], BFO was proposed to learn the structure of Bayesian networks. The experimental results verify that the proposed BFO algorithm is a viable alternative to learn the structures of Bayesian networks and is also highly competitive compared to state-of-the-art algorithms. In [63], BFO was employed to optimize the training parameters appeared in adaptive neuro-fuzzy inference system for speed control of matrix converter- (MC-) fed brushless direct current (BLDC) motor. The simulation results have reported that the BFO approach is much superior to the other nature-inspired algorithms. In [64], a chaotic local search based BFO (CLS-BFO) was proposed, which introduced the DE operator and the chaotic search operator into the chemotaxis step of the original BFO.

Inspired from the above works, in this paper, the BFO method was integrated with FKNN for the maximum classification performance. In order to further improve the diversity of the bacteria swarm, chaos theory combination with the Gaussian mutation was introduced in BFO. Then, the resulting CBFO-FKNN model was applied to the detection of PD. In our previous work, we have applied BFO in the classification of speech signals for PD diagnosis [35]. In this work, we have further improved the BFO by embedding the chaotic theory and Gauss mutation and combined with the effective FKNN classifier. In order to validate the effectiveness of the proposed CBFO-FKNN approach, FKNN based on five other meta-heuristic algorithms including original BFO, particle swarm optimization (PSO), genetic algorithms (GA), fruit fly optimization (FOA), and firefly algorithm (FA) was implemented for strict comparison. In addition, advanced machine learning methods, including the support vector machine (SVM), kernel based extreme learning machine (KELM) methods, and SVM with local learning-based feature selection (LOGO) [65] (LOGO-SVM), were compared with the proposed CBFO-FKNN model in terms of classification accuracy (ACC), area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. The experimental results show that the proposed CBFO-FKNN approach has exhibited high ACC, AUC, sensitivity, and specificity on both datasets. This work is a fully extended version of our previously published conference paper [66] and that further improved method has been provided.

The main contributions of this study are as follows:(a)First, we introduce chaos theory and Gaussian mutation enhanced BFO to adaptively determine the two key parameters of FKNN, which aided the FKNN classifier in more efficiently achieving the maximum classification performance, more stable and robust when compared to five other bio-inspired algorithms-based FKNN models and other advanced machine learning methods such as SVM and KELM.(b)The resulting model, CBFO-FKNN, is introduced to discriminate the persons with PD from the healthy ones on the two PD datasets of UCI machine learning repository. It is promising to serve as a computer-aided decision-making tool for early detection of PD.

The remainder of this paper is structured as follows. In Section 2, background information regarding FKNN, BFO, chaos theory, and Gaussian mutation is presented. The implementation of the proposed methodology is explained in Section 3. In Section 4, the experimental design is described in detail. The experimental results and a discussion are presented in Section 5. Finally, Section 6 concludes the paper.

2. Background Information

2.1. Fuzzy k-Nearest Neighbor (FKNN)

In this section, a brief description of FKNN is provided. A detailed description of FKNN can be referred to in [41]. In FKNN, the fuzzy membership values of samples are assigned to different categories as follows:where i=1,2,…C, j=1,2,…,K, C represents the number of classes, and K means the number of nearest neighbors. The fuzzy strength parameter (m) is used to determine how heavily the distance is weighted when calculating each neighbor’s contribution to the membership value. . is usually selected as the value of m. In addition, the Euclidean distance, the distance between x and its jth nearest neighbor , is usually selected as the distance metric. Furthermore, denotes the degree of membership of the pattern from the training set to class i among the k-nearest neighbors of . In this study, the constrained fuzzy membership approach was adopted in that the k-nearest neighbors of each training pattern (i.e., ) were determined, and the membership of xk in each class was assigned as

The value of denotes the number of neighbors belonging to class. The membership values calculated using (2) should satisfy the following equations:

After calculating all of the membership values of a query sample, it is assigned to the class with which it has the highest degree of membership, i.e.,

2.2. Bacterial Foraging Optimization (BFO)

The bacterial foraging algorithm (BFO) is a novel nature-inspired optimization algorithm proposed by Passino in 2002 [49]. The BFO simulates the mechanism of approaching or moving away while sensing the concentration of peripheral substances in bacterial foraging process. This method contains four basic behaviors: chemotaxis, swarming, reproduction, and elimination-dispersal.

2.2.1. Chemotaxis

The chemotaxis behavior simulates two different positional shifts of E. coli bacterium that depend on the rotation of the flagellum, namely, tumbling and moving. The tumbling refers to looking for new directions and the moving refers to keeping the direction going. The specific operation is as follows: first, a unit step is moved in a certain random direction. If the fitness value of the new position is more suitable than the previous one, it will continue to move in that direction; if the fitness value of the new position is not better than before, the tumble operation is performed and moves in another random direction. When the maximum number of attempts is reached, the chemotaxis step is stopped. The chemotaxis step to operate is indicated by the following:where is the position of the ith bacterium. The j, k, and l, respectively, indicate the number of bacterial individuals to complete the chemotaxis, reproduction, and elimination-dispersal. C(i) is the chemotaxis step length for the ith bacteria to move. Δ is the random vector between -1, 1.

2.2.2. Swarming

In the process of foraging, the bacterial community can adjust the gravitation and repulsion between the cell and the cell, so that the bacteria in the case of aggregation characteristics and maintain their relatively independent position. The gravitation causes the bacteria to clump together, and the repulsion forces the bacteria to disperse in a relatively independent position to obtain food.

2.2.3. Reproduction

In the reproduction operation of BFO algorithm, the algorithm accumulates the fitness values of all the positions that the bacterial individual passes through in the chemotaxis operation and arranges the bacteria in descending order. Then the first half of the bacteria divides themselves into two bacteria by binary fission, and the other half die. As a result, the new reproduced bacterial individual has the same foraging ability as the original individual, and the population size of bacterial is always constant.

2.2.4. Elimination-Dispersal

After the algorithm has been reproduced for several generations, the bacteria will undergo elimination-dispersal at a given probability Ped, and the selected bacteria will be randomly redistributed to new positions. Specifically, if a bacterial individual in the bacterial community satisfies the probability Ped of elimination-dispersal, the individual loses the original position of foraging and randomly selects a new position in the solution space, thereby promoting the search of the global optimal solution.

2.3. Chaotic Mapping

Chaos, as a widespread nonlinear phenomenon in nature, has the characteristics of randomness, ergodicity, sensitivity to initial conditions and so on [67]. Due to the characteristics of ergodicity and randomness, chaotic motions can traverse all the states in a certain range according to their own laws without repetition. Therefore, if we use chaos variables to search optimally, we will undoubtedly have more advantages than random search. Chaos ergodicity features can be used to optimize the search and avoid falling into the local minima; therefore, chaos optimization search method has become a novel optimization technique. Chaotic sequences generated by different mappings can be used such as logistic map, sine map, singer map, sinusoidal map, and tent map. In this paper, several chaotic maps were tried and the best one was chosen to combine with the BFO algorithm. According to the preliminary experiment, logistic map has achieved the best results. Thus, the chaotic sequences are generated by using logistic map as follows:u is the control parameter and let u = 4. When u = 4, the logistic mapping comes into a thorough chaotic state. Let and .

The initial bacterial population is mapped to the chaotic sequence that has been generated according to (6), resulting in a corresponding chaotic bacterial population pch.

2.4. Gaussian Mutation

The Gaussian mutation operation has been derived from the Gaussian normal distribution and has demonstrated its effectiveness with application to evolutionary search [68]. This theory was referred to as classical evolutionary programming (CEP).The Gaussian mutations have been used to exploit the searching capabilities of ABC [69], PSO [70], and DE [71]. Also, Gaussian mutation is more likely to create a new offspring near the original parent because of its narrow tail. Due to this, the search equation will take smaller steps allowing for every corner of the search space to be explored in a much better way. Hence it is expected to provide relatively faster convergence. The Gaussian density function is given bywhere is the variance for each member of the population.

3. Proposed CBFO-FKNN Model

In this section, we described the new evolutionary FKNN model based on the CBFO strategy. The two key parameters of FKNN were automatically tuned based on the CBFO strategy. As shown in Figure 1, the proposed methodology has two main parts, including the inner parameter optimization procedure and outer performance evaluation procedure. The main objective of the inner parameter optimization procedure was to optimize the parameter neighborhood size (k) and fuzzy strength parameter (m) by using the CBFO technique via a 5-fold cross-validation (CV). Then, the obtained best values of (k, m) were input into the FKNN prediction model in order to perform the PD diagnostic classification task in the outer loop via the 10-fold CV. The classification error rate was used as the fitness function.where testErrori means the average test error of the FKNN classifier.

The main steps conducted by the CBFO strategy are described in detail as shown in Algorithm 1.

Begin
Step ⁢1: Parameter Initialization.  Initialize the number of dimensions in the search space p, the
swarm size of the population S, the number of chemotactic steps Nc, the swimming length Ns, the
number of reproduction steps Nre, the number of elimination-dispersal events Ned, the
elimination-dispersal probability Ped, the size of the step C(i) taken in the random direction
specified by the tumble.
Step ⁢2: Population Initialization. Calculate chaotic sequence according to Eq. (6). The
corresponding chaotic bacterial population is calculated according to the original bacterial
population mapped into the chaotic sequence according to Eq. (7). From the original and its
corresponding chaotic bacterial populations, S superior individuals are selected as the initial
solutions of bacterial populations.
Step ⁢3: for  ell=1:Ned⁢ ⁢/Elimination and dispersal loop/
for  K=1:Nre⁢ ⁢/Reproduction loop/
for  j=1:Nc⁢ ⁢/ chemotaxis loop/
Intertime=Intertime+1;⁢ ⁢/ represent the number of iterations/
for  i=1:s
/fobj represents calculating the fitness of the ith bacterium at the jth
chemotactic, Kth reproductive, and lth elimination-dispersal steps./
J(i,j,K,ell)=fobj(P(:,i,j,K,ell));
/ Jlast stores this value since a cost better than a run may be identified./
Jlast=J(i,j,K,ell);
/ gbest(1,:) stores the current optimal bacterial individual./
gbest(1,:)=P(:,i,j,K,ell);
Tumble according to Eq.(5)
/Swim (for bacteria that seem to be headed in the right direction)/
m=0;⁢ ⁢/ Initialize counter for swim length/
while  m<Ns
m=m+1;
if  J(i,j+1,K,ell)<Jlast
/ Jlast stores this value since a cost better than a run may be identified./
Jlast=J(i,j+1,K,ell);
Tumble according Eq.(5)
if  Jlast<Gbest
/ Gbest stores the current optimal fitness function value./
Gbest = Jlast;
gbest(1,:)=P(:,i,j+1,K,ell);
End
else
m=Ns;
End
Gaussian mutation operation
Moth_pos_m_gaus=gbest(1,:)(1+randn(⁢1));
Moth_fitness_m_gaus=fobj(Moth_pos_m_gaus);
Moth_fitness_s=fobj(gbest(1,:));
Moth_fitness_comb=[Moth_fitness_m_gaus,Moth_fitness_s];
[~,mm]=min(Moth_fitness_comb);
if  mm==1
gbest(1,:)=Moth_pos_m_gaus;
end
fitnessGbest = fobj(gbest(1,:));
if  fitnessGbest<Gbest
Gbest = fitnessGbest;
end
End
End⁢ ⁢/Go to next bacterium/
End⁢ ⁢/Go to the next chemotactic/
/Reproduction/
Jhealth=sum(J(:,:,K,ell),2);⁢ ⁢/ Set the health of each of the S bacteria/
[Jhealth, sortind]=sort(Jhealth);⁢ ⁢/Sorts the nutrient concentration in order of ascending/
/ Rearrange the bacterial population/
P(:,:,1,K+1,ell)=P(:,sortind,Nc+1,K,ell);
/Split the bacteria (reproduction)/
for  i=1:Sr
/The least fit do not reproduce, the most fit ones split into two identical copies/
P(:,i+Sr,1,K+1,ell)=P(:,i,1,K+1,ell);
End
End⁢ ⁢/Go to next reproduction/
/Elimination-Dispersal/
for m=1:s
if Ped>rand⁢ ⁢/randomly generates a new individual anywhere in the solution space./
Reinitialize bacteria m
End
End
End⁢ ⁢/Go to next Elimination-Dispersal/
End

4. Experimental Design

4.1. Oxford Parkinson’s Disease Data

The Oxford Parkinson’s disease data set was donated by Little et al. [13], abbreviation as Oxford dataset. The data set was used to discriminate patients with PD from healthy controls via the detection of differences in vowel sounds. Various biomedical voice measurements were collected from 31 subjects. 23 of them are patients with PD, and 8 of them are healthy controls. The subjects ranged from 46 to 85 years of age. Each subject provided an average of six sustained vowel “ahh…” phonations, ranging from 1 to 36 seconds in length [13], yielding 195 total samples. Each recording was subjected to different measurements, yielding 22 real-value features. Table 1 lists these 22 vocal features and their statistical parameters.

4.2. Istanbul Parkinson’s Disease Data

The second data set in this study was deposited by Sakar et al. [36] from Istanbul, Turkey, abbreviation as Istanbul dataset. It contained multiple types of sound recordings, including sustained vowels, numbers, words, and short sentences from 68 subjects. Specifically, the training data collected from 40 persons including 20 patients with PD ranging from 43 to 77 and 20 healthy persons ranging from 45 to 83, while testing data was collected from 28 different patients with PD ranging 39 and 79. In this study, we selected only 3 types of sustained vowel recordings /a/, /o/, and /u/, with similar data type to the Oxford PD dataset. We merged them together and produced a database which contains total 288 sustained vowels samples and the analyses were made on these samples. As shown in Table 2, a group of 26 linear and time-frequency based features are extracted for each voice sample.

4.3. Experimental Setup

The experiment was performed on a platform of Windows 7 operating system with an Intel (R) Xeon (R) CPU E5-2660 v3 @ 2.6 GHz and 16GB of RAM. The CBFO-FKNN, BFO-FKNN, PSO-FKNN, GA-FKNN, FOA-FKNN, FA-FKNN, SVM, and KELM classification models were implemented with MATLAB 2014b. The LIBSVM package [72] was used for the SVM classification. The algorithm available at http://www3.ntu.edu.sg/home/egbhuang was used for the KELM classification. The CBFO-FKNN method was implemented from scratch. The data was scaled into a range of 0, 1 before each classification was conducted.

The parameters C and γ in used during the SVM and KELM classifications were determined via the grid search method; the search ranges were defined as and . A population swarm size of 8, chemotactic step number of 25, swimming length of 4, reproduction step number of 3, elimination-dispersal event number of 2, and elimination-dispersal probability of 0.25 were selected for the CBFO-FKNN. The chemotaxis step value was established through trial and error, as shown in the experimental results section. The initial parameters of the other four meta-heuristic algorithms involved in training FKNN are chosen by trial and error as reported in Table 3.

4.4. Data Classification

A stratified k-fold CV [73] was used to validate the performance of the proposed approach and other comparative models. In most studies, k is given the value of 10. During each step, 90% of the samples are used to form a training set, and the remaining samples are used as the test set. Then, the average of the results of all 10 trials is computed. The advantage of this method is that all of the test sets remain independent, ensuring reliable results.

A nested stratified 10-fold CV, which has been widely used in previous research, was used for the purposes of this study [74]. The classification performance evaluation was conducted in the outer loop. Since a 10-fold CV was used in the outer loop, the classifiers were evaluated in one independent fold of data, and the other nine folds of data were left for training. The parameter optimization process was performed in the inner loop. Since a 5-fold CV was used in the inner loop, the CBFO-FKNN searched for the optimal values of k and m, and the SVM and KELM searched for the optimal values of C and in the remaining nine folds of data. The nine folds of data were further split into one fold of data for the performance evaluation, and four folds of data were left for training.

4.5. Evaluation Criteria

ACC, AUC, sensitivity, and specificity were taken to evaluate the performance of different models. These measurements are defined as where TP is the number of true positives, FN means the number of false negatives, TN represents the true negatives, and FP is the false positives. AUC [75] is the area under the ROC curve.

5. Experimental Results and Discussion

5.1. Benchmark Function Validation

In order to test the performance of the proposed algorithm CBFO, 23 benchmark functions which include unimodal, multimodal, and fixed-dimension multimodal were used to do experiments. These functions are listed in Tables 46 where Dim represents the dimension, Range is the search space, and is the best value.

In order to verify the validity of the proposed algorithm, the original BFO, Firefly Algorithm(FA)[76], Flower Pollination Algorithm (FPA)[77], Bat Algorithm (BA)[78], Dragonfly Algorithm (DA)[79], Particle Swarm Optimization (PSO)[80], and the improved BFO called PSOBFO were compared on these issues. The parameters of the above algorithm are set according to their original papers, and the specific parameter values are set as shown in Table 7. In order to ensure that the results obtained are not biased, 30 independent experiments are performed. In all experiments, the number of population size is set to 50 and the maximum number of iterations is set to 500.

Tables 810 show average results (Avg), standard deviation (Stdv), and overall ranks for different algorithms dealing with F1-23 issues. It should be noted that the ranking is based on the average result (Avg) of 30 independent experiments for each problem. In order to visually compare the convergence performance of our proposed algorithm and other algorithms, Figures 24 use the logarithmic scale diagram to reflect the convergence behaviors. In Figures 24, we only select typical function convergence curves from unimodal functions, multimodal functions, and fixed-dimension multimodal functions, respectively. The results of the unimodal F1-F7 are shown in Table 8. As shown, the optimization effect of CBFO in F1, F2, F3, and F4 is the same as the improved PSOBFO, but the performance is improved compared with the original BFO. Moreover, From the ranking results, it can be concluded that, compared with other algorithms, CBFO is the best solution to solve the problems of F1-F7.

With respect to the convergence trends described in Figure 2, it can be observed that the proposed CBFO is capable of testifying a very fast convergence and it can be superior to all other methods in dealing with F1, F2, F3, F4, F5, and F7. For F1, F2, F3, and F4, the CBFO has converged so fast during few searching steps compared to other algorithms. In particular, when dealing with cases F1, F2, F3, and F4, the trend converges rapidly after 250 iterations.

The calculated results for multimodal F8-F13 are tabulated in Table 9. It is observed that CBFO has attained the exact optimal solutions for 30-dimension problems F8 and F12 in all 30 runs. From the results for F9, F10, F11, and F13 problems, it can be agreed that the CBFO yields very competitive solutions compared to the PSOBFO. However, based on rankings, the CBFO is the best overall technique and the overall ranks show that the BFO, FA, BA, PSO, FPA, and DA algorithms are in the next places, respectively.

According to the corresponding convergence trend recorded in Figure 3, the relative superiority of the proposed CBFO in settling F8, F11, and F12 test problems can be recognized. In tackling F11, the CBFO can dominate all its competitors in tackling F11 only during few iterations. On the other hand, methods such as FPA, BA, DA, and PSO still cannot improve the quality of solutions in solving F11 throughout more steps.

The results for F14 to F23 are tabulated in Table 10. The results in Table 10 reveal that the CBFO is the best algorithm and can outperform all other methods in dealing with F15 problems. In F16, F17, and F19, it can be seen that the optimization effect of all the algorithms is not much different. In dealing with F20 case, the CBFO’s performance is improved compared to original BFO and the improved PSOBFO. Especially in solving F18, the proposed algorithm is much better than the improved PSOBFO. From Figure 4, we can see that the convergence speed of the CBFO is better than other algorithms in dealing with F15, F18, F19, and F20. For F15, it surpasses all methods.

In order to investigate significant differences of obtained results for the CBFO over other competitors, the Wilcoxon rank-sum test [81] at 5% significance level was also employed in this paper. The p values of comparisons are reported in Tables 1113. In each table, each p value which is not lower than 0.05 is shown in bold face. It shows that the differences are not significant.

The p values are also provided in Table 11 for F1-F7. Referring to the p values of the Wilcoxon test in Table 11, it is verified that the proposed algorithm is statistically meaningful. The reason is that all p values are less than 0.05 except PSOBFO in F1, F2, F3, and F4. According to the p values in Table 12, all values are less than 0.05 except PSOBFO in F11 problem. Hence, it can be approved that the results of the CBFO are statistically improved compared to the other methods. As can be seen from the p value in Table 13, the CBFO algorithm is significantly better than the PSOBFO, FPA, BA, and PSO for F14-F23.

The results demonstrate that the utilized chaotic mapping strategy and Gaussian mutation in the CBFO technique have improved the efficacy of the classical BFO, in a significant manner. On the one hand, applying the chaotic mapping strategy to the bacterial population initialization process can speed up the initial exploration of the algorithm. On the other hand, adding Gaussian mutation to the current best bacterial individual in the iterative process helps to jump out of the local optimum. In conclusion, the proposed CBFO can make a better balance between explorative and exploitative trends using the embedded strategies.

5.2. Results on the Parkinson’s Disease

Many studies have demonstrated that the performance of BFO can be affected heavily by the chemotaxis step size C(i). Therefore, we have also investigated the effects of C(i) on the performance of the CBFO-FKNN. Table 14 displays the detailed results of CBFO-FKNN model with different values of C(i) on the two datasets. In the table, the mean results and their standard deviations (in parentheses) are listed. As shown, the CBFO-FKNN model performed best with an average accuracy of 96.97%, an AUC of 0.9781, a sensitivity of 96.87%, and a specificity of 98.75% when C(i) = 0.1 on the Oxford dataset and an average accuracy of 83.68%, an AUC of 0.6513, a sensitivity of 96.92%, and a specificity of 33.33% when C(i) = 0.2 on the Istanbul dataset. Furthermore, the CBFO-FKNN approach also yielded the most reliable results with the minimum standard deviation when C(i) = 0.1 and C(i) = 0.2 on the Oxford dataset and Istanbul dataset, respectively. Therefore, values of 0.1 and 0.2 were selected as the parameter value of C(i) for CBFO-FKNN on the two datasets, respectively, in the subsequent experimental analysis.

The ACC, AUC, sensitivity, specificity, and optimal (k, m) pair values of each fold obtained via the CBFO-FKNN model with C(i) = 0.1 and C(i) = 0.2 on the Oxford dataset and Istanbul dataset are shown in Tables 15 and 16, respectively. As shown, each fold possessed a different parameter pair (k, m) since the parameters for each set of fold data were automatically determined via the CBFO method. With the optimal parameter pair, the FKNN yielded different optimal classification performance values in each fold. This was attributed to the adaptive tuning of the two parameters by the CBFO based on the specific distribution of each data set.

In order to investigate the convergence behavior of the proposed CBFO-FKNN method, the classification error rate versus the number of iterations was recorded. For simplicity, herein we take the Oxford dataset for example. Figures 5(a)5(d) display the learning curves of the CBFO-FKNN for folds 1, 3, 5, and 7 in the 10-fold CV, respectively. As shown, all four fitness curves of CBFO converged into a global optimum in fewer than 20 iterations. The fitness curves gradually improved from iterations 1 through 20 but exhibited no significant improvements after iteration 20. The fitness curves ceased after 50 iterations (the maximum number of iterations). The error rates of the fitness curves decreased rapidly at the beginning of the evolutionary process and continued to decrease slowly after a certain number of iterations. During the latter part of the evolutionary process, the fitness curves remained stable until the stopping criteria, the maximum number of iterations, were satisfied. Thus, the proposed CBFO-FKNN model efficiently converged toward the global optima.

To validate the effectiveness of the proposed method, the CBFO-FKNN model was compared to five other meta-heuristic algorithms-based FKNN models as well as three other advanced machine learning approaches including SVM, KELM, and SVM with local learning-based feature selection (LOGO-SVM). As shown in Figure 6, the CBFO-FKNN method performed better than other competitors in terms of ACC, AUC, and sensitivity on the Oxford dataset. We can see that the CBFO-FKNN method yields the highest average ACC value of 96.97%, followed by PSO-FKNN, LOGO-SVM, KELM, SVM, FOA-FKNN, FA-FKNN, and BFO-FKNN. GA-FKNN has got the worst result among the all methods. On the AUC metric, OBF-FKNN obtained similar results with FA-FKNN, followed by FOA-FKNN, GA-FKNN, PSO-FKNN, BFO-FKNN, KELM, and LOGO-SVM, and SVM has got the worst result. On the sensitivity metric, CBFO-FKNN has achieved obvious advantages, LOGO-FKNN ranked second, followed by KELM, SVM, PSO-FKNN, FOA-FKNN, FA-FKNN, and GA-FKNN. BFO-FKNN has got the worst performance. On the specificity metric, FA-FKNN achieved the maximum results, GA-FKNN and FOA-FKNN have achieved similar results, which ranked second, followed by BFO-FKNN, PSO-FKNN, CBFO-FKNN, and SVM. KELM and LOGO-SVM have obtained similar results, both of which got the worst performance. Regarding the Istanbul dataset, CBFO-FKNN produced the highest result with the ACC of 83.68%, while the LOGO-SVM and PSO-FKNN method yields the second best average ACC value as shown in Figure 7, followed by KELM, SVM, FOA-FKNN, FA-FKNN, BFO-FKNN, and GA-FKNN. From Figures 6 and 7, we can also find that the CBFO-FKNN can yield a smaller or comparative standard deviation than the other counterparts in terms of the four performance metrics on the both datasets. Additionally, we can find that the SVM with local learning-based feature selection can improve the performance of the two datasets. It indicates that there are some irrelevant features or redundant features in these two datasets. It should be noted that the LOGO method was used for feature selection, all the features were ranked by the LOGO, then all the feature subsets were evaluated incrementally, and finally the feature subset achieved the best accuracy was chosen as the one in the experiment.

According to the results, the superior performance of the proposed CBFO-FKNN indicates that the proposed method was the most robust tool for detection of PD among the nine methods. The main reason may lie in that the OBL mechanism greatly improves the diversity of the population and increases the probability of BFO escaping from the local optimum. Thus, it gets more chances to find the optimal neighborhood size and fuzzy strength values by the CBFO, which aided the FKNN classifier in more efficiently achieving the maximum classification performance. Figure 8 displays the surface of training classification accuracies achieved by the SVM and KELM methods for several folds of the training data via the grid search strategy on the Oxford dataset. Through the experimental process, we can find the original BFO is more prone to overfitting; this paper introduces chaotic initialization, enriches the diversity of the initial population, and improves the convergence speed of the population as well; in addition, this paper also introduced Gaussian mutation strategy for enhancing the ability of the algorithm to jump out of local optimum, so as to alleviate the overfitting problem of FKNN in the process of classification.

We have also investigated whether the diagnosis was affected by age and gender. Herein, we have taken the Oxford dataset for example. The dataset was divided by the age (old or young) and gender (male or female), respectively. Regarding the age, we have chosen the mean age of 65.8 years as the dividing point. The samples in the old group are more than 65.8, and the samples in the young group are less than 65.8. Therefore, we can obtain four groups of data including male group, female group, old group, and young group. The classification results of the four groups in terms of confusion matrix are displayed in Table 17. As shown, we can find that either in the male group or in the female group 3 PD samples were wrongly classified as healthy ones, and 2 healthy samples were misjudged as PD ones. It indicates that the gender has little impact on the diagnostic results. In the old group, we can find that 4 PD samples were wrongly identified as healthy ones. However, none of the samples were misjudged in the young group. It suggests that the speech samples in the old group are much easier to be wrongly predicted than those in the young group.

To further investigate the impact of gender and age on the diagnosis results. We have further divided the samples into male group and female group on the premise of young and old age and old group and young group on the premise of male and female, respectively. So we can obtain 8 groups as shown in Table 18, and the detailed classification results are displayed in terms of confusion matrix. As shown, we can find that the probability of the sample being misclassified is closer in the old group and young group on the premise of male and female. It can be also observed that there was no sample being wrongly predicted in male and female groups on the premise of young persons, while there was one sample being wrongly predicted in male and female groups on the premise of old persons, respectively. We can arrive at the conclusion that the presbyphonic may play a confounding role in the female and male dysphonic set, and the results of diagnosis were less affected by gender.

The classification accuracies of other methods applied to the diagnosis of PD are presented for comparison in Table 19. As shown, the proposed CBFO-FKNN method achieved relatively high classification accuracy and, therefore, it could be used as an effective diagnostic tool.

6. Conclusions and Future Work

In this study, we have proposed a novel evolutionary instance-based approach based on a chaotic BFO and applied it to differentiating the PD from the healthy people. In the proposed methodology, the chaos theory enhanced BFO strategy was used to automatically determine the two key parameters, thereby utilizing the FKNN to its fullest potential. The results suggested that the proposed CBFO-FKNN approach outperformed five other FKNN models based on nature-inspired methods and three commonly used advanced machine learning methods including SVM, LOGO-SVM, and KELM, in terms of various performance metrics. In addition, the simulation results indicated that the proposed CBFO-FKNN could be used as an efficient computer-aided diagnostic tool for clinical decision-making. Through the experimental analysis, we can arrive at the conclusion that the presbyphonic may play a confounding role in the female and male dysphonic set, and the results of diagnosis were less affected by gender. Additionally, the speech samples in the old group are much easier to be wrongly predicted than those in the young group.

In future studies, the proposed method will be implemented in a distributed environment in order to further boost its PD diagnostic efficacy. Additionally, implementing the feature selection using CBFO strategy to further boost the performance of the proposed method is another future work. Finally, due to the small vocal datasets of PD, we will generalize the proposed method to much larger datasets in the future.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of article.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (61702376 and 61402337). This research is also funded by Zhejiang Provincial Natural Science Foundation of China (LY17F020012, LY14F020035, LQ13F020011, and LQ13G010007) and Science and Technology Plan Project of Wenzhou of China (ZG2017019, H20110003, Y20160070, and Y20160469).