Abstract

The healthcare systems are extensively being used with increased focus on safety of patients. Software engineering for healthcare applications is an emerging research area. Detecting defects is a critical step of software development process of healthcare applications. The performance of the Software Defect Prediction model (SDP) depends on the features of healthcare system; irrelevant features decrease the performance of the model. An optimized feature selection technique is needed to recognize and remove the irrelevant features. In this study, a new optimized feature selection technique, i.e., multiobjective Harris Hawk Optimization (HHO), is proposed for binary classification problem with Adaptive Synthetic Sampling (ADASYN) Technique. Multiobjective HHO is proposed with two main objectives, one to reduce the total amount of selected features and the other to maximize the performance of the proposed model. The multiobjective feature selection technique helps to find the optimal solution to achieve the desired objectives and increase the classification performance in terms of accuracy, AUC, precision, recall, and F1-measure. The study conducts an experiment on a healthcare dataset. Six different search techniques (RF, SVM, bagging, adaptive boosting, voting, and stacking) are implemented on the dataset. The proposed model helps to predict the software defects with a significant classification accuracy of 0.990 and AUC score of 0.992.

1. Introduction

Software consultants and vendors develop high-quality healthcare systems such as middle-ware medical devices, hospital management systems, and electronic systems used in medical domain [1]. In recent years, software applications are playing a vital role in every organization and business. Companies rely on software applications for handling their daily operations [2]. These software applications and systems hold critical importance in the healthcare domain due to severe consequences associated with their malfunction. Therefore, healthcare applications are based on design rules and best practices for high-quality applications [1, 2].

Software Development Life Cycle (SDLC) (abbreviations and acronyms given in Table 1) provides a basic set of rules that are used in design, development, and testing of different software applications [2]. However, with increase in size and complexity of software, ensuring quality via testing results in increased cost [13]. In recent years, the healthcare industry is growing in both high- and low-level income countries [46]. These systems face many challenges, such as elicitation of user requirements to development, testing, and deployment of software applications [17]. It is important to consider the design rules of users and methodologies to increase the quality of software applications. Software developers are unable to perform exhaustive testing for high-quality healthcare software applications; therefore, a value-based approach to testing is required. There is a need to identify and test the most defective parts of the system. SDP has become the most investigated area in the field of software quality [8, 9]. Software defect prediction is a formal approach that has many different models, processes, and assessment standards [10]. It suffers from many challenges such as different datasets, problems in extracting the best features for defect prediction, and insufficient prediction models [8, 10, 11]. The defect prediction designs are frequently used by different industries such as healthcare systems to help in fault predication and effort estimation required to ensure reliability of healthcare software’s [111]. Recognizing and removing the software defects in healthcare systems is a resource intensive activity. Preliminary defect prediction results in timely defects correction [12]. The purpose of SDP is to predict the possible defects and features of healthcare software systems [13]. Software defect prediction performance is affected by representation of defective data features [14]. Therefore, it is important to remove the irrelevant features while designing the software model [15]. Feature selection indents for enhancing the accuracy of SDP models by dropping irrelevant features and decreasing the time and complications of these algorithms. Three different feature selection techniques are wrapper technique, embedded technique, and filter technique [1618]. Filter technique assigns the score to all features of the dataset. Wrapper technique uses the classifiers to estimate the results of feature selection. The literature shows that the results of wrapper techniques are better than the filter techniques [16]. Embedded techniques consider the selection of features as part of the learning classifiers [16, 17]. The SDP models are based on three components: machine learning (ML) algorithms, soft computing, and metrics of software [14]. The procedure of establishing the metrics model of the software is associated with gathering metrics features for predicting the defects. This technique does not satisfactorily work with diverse projects or diverse versions. Therefore, the researchers apply change metrics of software to control the problem and make a precise software defect prediction. However, the technique is inappropriate and takes more time with complicated systems in large industries. Prediction of defects at an early stage of software implementation process helps to decrease the cost of implementation and computation [1416]. The existing techniques only identify the defects in the typical code base [10]. This research is designed to test the following hypothesis. (i)Alternative hypothesis (H1): the selected features increase the software defect prediction model accuracy.(ii)Null hypothesis (Ho): the selected features do not increase the software defect prediction model accuracy.

The most promising methods are ML algorithms such as the -Nearest Neighbor (KNN), Support Vector Machine (SVM), Naïve Bayes (NB) and Logistic Regression (LR) [1821], Ensemble classifiers [2225], and other different feature selection techniques [16, 18, 20, 21] are also used in research [14]. Machine learning techniques are the soul of data mining, which are used successfully for solving the complicated problems either in industry or research [16].

In healthcare softwares, many defects remain undetected during the software development process by developers. This is because of the misinterpretation of the requirements of healthcare software, unreasonable process of development or the insufficient experience of development and less effective models [1216]. The presence of defects results in decreased quality which may result in failure of healthcare software projects [218]. Effective techniques for identifying potential defective components as early as possible can be used to optimize the testing effort. There is a need to address these issues by using state-of-the-art techniques for defect prediction to enhance the quality of healthcare systems.

The research objective of the research is to identify relevant features to predict the software defects of healthcare systems and the ML algorithms that improves the accuracy of SDP. The main contributions of the research are given as follows: (1)The proposed technique provides a better accuracy which uses a novel optimized feature selection technique with machine learning classifiers for healthcare systems(2)Early and accurate detection of defects helps in achieving high accuracy and performance of defect prediction model

The paper is organized as follows: Section 2 gives the background of SDP research and describes the state-of-the-art techniques of ML, Section 3 describes the proposed methodology, and the results are presented in Section 3.

2. Literature Review

In software systems, the defects are detected by applying the diverse algorithms especially ML algorithms. The researchers have used different feature selection algorithms merging different classifiers for increasing the performance of the software systems. The literature of defect prediction in health care domain is limited in terms of machine learning algorithms. The literature is divided on the basis of healthcare software and general software for defect prediction.

2.1. Defects Prediction Techniques for Healthcare Software

Different software development process models are used to predict the defects in healthcare softwares. The defect prediction technique such as model localization, static analysis, software metrics, and code review are used to identify the defects in healthcare applications [2, 3]. The modern analysis tools help to reduce the software maintenance cost by early detection of software defects in software development process, and the statistical analysis tool is used to analyze the software system without executing the software. [3]. The recall data of user interface errors in medical devices is analyzed. The two phases are carried out to recall the data. In the first phase, about 423 medical-related recalls are identified from user interface software errors. In this phase, the semiautomatic filtering process is applied to eliminate the recalls quickly that were not caused by software errors. In the second phase, the total number of 499 user interface software errors are identified and detail classification of the errors are established. The data is classified into 20 categories that is used by healthcare providers, device manufacturers, and regulatory authorities to raise the awareness of the impact of user interface software errors. The classification provides the evidence-based challenge to the stakeholder to increase the quality of the user interface software in different medical devices [7].

2.2. Machine Learning (ML) Techniques for Healthcare Software

Machine learning helps to diagnose the problem in different medical domains and analyze the most important clinical parameters for prediction [2632]. The ML techniques are used for analysis of data such as data regularity detection which deals with data imperfection and continuous interpretation data that is used in intensive care unit. The ML algorithms help in the integration of the computer-based systems for healthcare software that increase the quality and efficiency of the healthcare systems [33, 34]. The ML techniques are used to explore the patterns from different medical data sources and help to predict the defective data appropriately [35]. Different ML techniques such as supervised learning, semisupervised learning, unsupervised learning, and reinforcement learning are reviewed to develop the efficient decision support system for healthcare software. The machine learning-based health protection system is capable to identify the data patterns efficiently [35]. The ensemble ML model is presented for predicting the time and behaviour of the oxide glasses for different medical applications. Data of 1300 records are used for an original glass dissolution experiment. The results demonstrate that the proposed model accurately predicts the chemical degradation behaviour of different glasses. It uses ML algorithms to handle and utilize the biomedical data from different perspectives [36]. The different Internet of Things (IoT) medical devices such as emergency medical equipment, medical drone, and ambulance face severe challenges like signal distortion and security issues [37, 38]. The paper represents the efficient lightweight encryption algorithm to design the secure image encryption technique for the healthcare industry. The proposed technique utilizes two different permutation techniques to secure the medical images. The proposed technique is analyzed, evaluated, and compared with the traditional encrypted ones on execution time. Multiple experiments are conducted that show that the proposed technique for image encryption provides better efficiency as compared to the traditional techniques. [38].

2.3. Machine Learning Techniques for Defect Prediction for General Software
2.3.1. Feature Selection (FS) Techniques

Different metaheuristic optimization algorithms are used as a search approach for feature selection techniques to predict the software defects. Researchers have used applied the base classifiers on different datasets and obtained the results presented in Table 2. In [17], the cluster hybrid feature selection technique is used for defect prediction. The proposed technique is applied on fifteen open source datasets, and the best average AUC value achieved by Pearson’s correlation is 0.971, 0.809 achieved by MIC, 0.915 achieved by Spearman’s correlation, and 0.915 achieved by Kendall’s correlation. Reference [18] proposes the Multiobjective Feature Selection (MOFES) method. MOFES uses Pareto-based multiobjective optimizing algorithm. The AUC value is 1 and 107 second average computational cost. In [19], the improved versions of WOA by merging with a single-point cross-over technique are proposed that uses the five different FS techniques, i.e., random, tournament, Roulette wheel, stochastic universal sampling, and linear rank. The computational cost of the given model is high. In [20], a novel binary version of current HHO algorithm, i.e., EBHHO, for FS is proposed in [21]. The binary version of Queuing Search Algorithm (QSA) which is constructed on wrapper FS technique is proposed. The proposed model is applied on a 14 benchmark dataset, and the average AUC value based on -rank test is 1.57; however, the prediction quality reduces when oversampling ratio is greater than 300 In [22], a relief F-based clustering and a cluster-based feature selection approach are proposed to identify and remove the redundant and irrelevant features. The proposed approach is applied on nine NASA SDP datasets and achieved the highest AUC value of RFC which is 0.767 for J48 classifier and 0.813 for Naive Bayes. In [23], a framework is proposed for feature selection that uses Multilayer Filter Feature Selection approach and Feed Forward Artificial Neural Network (Multilayer Perception) for prediction of defective modules. The proposed framework is performed on 12 NASA datasets with oversampling technique; the ROC value is 0.955, -Measure is 0.918, and MCC is 0.838 that are significantly improved, but with little improvement in accuracy. In [24], the hybrid technique is proposed that combines the feature selection ability of the Opt-aiNet algorithm with ML classifiers to detect the defects. The proposed technique is applied on 5 open source NASA datasets. The results indicate that DT provides the highest accuracy of 94.82and 0.90 AUC value for the JM1 dataset. Reference [25] proposes a new feature selection technique En-Binary Particle Swarm Optimization integrated with the ensemble classifiers for defect prediction that based on fitness function. The proposed approach is applied on SOFTLAB and MORPH datasets. The results reflect that the proposed approach achieves the best FM rank by comparing with other feature selection techniques. In [39], the Particle Swarm Optimization on object-oriented metrics for feature selection is proposed. Reference [40] proposed the improved binary dragonfly algorithm that is the extended version of dragonfly algorithm for feature selection.

2.3.2. Sampling Techniques

Adaptive Synthetic Sampling (ADASYN) technique is used [16, 20] to rebalance the dataset for increasing the quality of classifiers. Synthetic Minority Oversampling (SMOTH) is used [21, 25, 41] for balancing the dataset to increase the accuracy of the proposed model. The results find that the SMOTE method enhances the performance of defect prediction with highly imbalanced dataset. A study [23] uses the Random Oversampling technique on 12 NASA datasets that decrease the dataset imbalance ratio by copying the instances in minority class. This technique enhances the dataset volume due to application.

2.3.3. Ensemble Techniques

Machine learning algorithms show promising performance for solving the software defect prediction problem. Ensemble techniques are used for predicting software defects. In [41], adaptive boosting, random forest, bagging, and XGBoost are used as an ensemble. Bagging, boosting and stacking, and voting ensemble classifiers [42, 43] are used for predicting the software defects. [44] provides the empirical comparison of SDP models that are developed through different boosting-based ensemble techniques on three open source projects. Three ensemble techniques RUSBoost, SMOTEBoost, MSMOTEBoost are integrated with resampling methods that improve the performance of the model. Many base classifiers are used such as -Nearest Neighbor (K-NN) [16, 1820, 41, 42], decision tree (DT) [16, 19, 20, 40, 42], Linear Discriminant Analysis (LDA) [1620], Logistic Regression (LR) [19], Naïve Bayes (NB) [1841], Random Forest (RF) [41, 43, 45], and Support Vector Machine (SVM) [19, 39, 43, 45]. The researchers applied the base classifiers on different datasets and obtained the required results.

Literature is discussed related to defect prediction techniques used in different medical devices and ML techniques to predict the defects for healthcare softwares and general softwares. Keeping in view the critical nature of healthcare systems and the consequences a defect can have, software defect predication is a much needed endeavour. More research is required for sophisticated, timely, and accurate defect prediction.

3. Proposed Methodology

The proposed model predicts the defects on the basis of selected features. For validation of the proposed model, a controlled experiment on python language is performed.The results are evaluated and compared by using AUC and accuracy measure. The performance optimization key parameters such as precision, recall, and F1-measure are also used to evaluate the model performance. The controlled experiment is a technical test that is mostly used by researchers for testing a unique variable. In this research, the independent variable (healthcare software defects) is used to test the effect on dependent variable (accuracy, AUC). The controlled variables are very important because variables change according to requirements that may impact on the behavior and relation among dependent and independent variables. In experiment, control variables are significant for testing the credibility of the results. The different optimization key metrics such as precision, recall, and F1-measure are used to check the performance of the model. The model consists of four different steps; the first step is to select the dataset. The second step is to perform the data preprocessing that consist of the main process to resample the imbalance data by using ADYSYN technique and then distributes the data into training and testing parts. The third step is to apply the wrapper feature selection technique HHO. The last step is to perform the experiment by using different ML algorithms RF, SVM, bagging, Adaboost, voting, and stacking. The proposed model is explained in Figure 1.

3.1. Experimental Setup

The experiment is held on a computer Intel(R) Core(TM) i7-2640M CPU 2.80 GHz with 4GB RAM and 64-bit Operating System. PYTHON is used as a tool for experiment. (i)RQ1: what is the impact of multiobjective feature selection method on the software defect prediction model?(ii)RQ2: does the bagging based ensemble classifier impact the accuracy of prediction of software defects?

3.2. Dataset Selection

The healthcare dataset is used for the experiment (https://www.kaggle.com/datasets/iqrayousaf/healthcare-dataset-for-defects-prediction). The dataset is created for healthcare systems by considering the defects in different medical applications [2] and articles related to software defects in healthcare software [3, 7]. The defective data in medical applications can cause death of patients. The most critical defects are predicted. The healthcare dataset has binary classes that is defective and nondefective. The dataset is available on Kaggle. The dataset is divided into two parts training and testing datasets, where 70% is used as training data to train the proposed methodology and 30% is considered testing data used to test the proposed model. The features in the healthcare dataset are given in Table 3.

3.3. Data Preprocessing

In this section, data preprocessing is performed to balance the highly imbalanced data and recognize and remove the irrelevant features. Data preprocessing step is the main part because it helps to achieve the more accurate features and increases the performance of the prediction model. It consists of two main steps: resampling dataset and feature selection.

3.4. Resample Dataset

The real life classification problem consists of large amount of data. Extracting the meaningful information becomes demanding in terms of space and computational time. Furthermore, irrelevant data may result in complicated and insufficient defect prediction models. Therefore, it is necessary to implement the preprocessing methods for boosting the classifiers performance [46]. The classifier performance is impacted by different factors such as the number of class type and number of samples. For data collection, the imbalance problem occurs when minority class is very compact as compared to majority class. The bright ML algorithms normally hurt when dataset is straight in direction of one class (minority or majority). Actually, most of collected data hurt by disproportion data that decreases the overall performance of the algorithms. There are two major techniques for handling the imbalanced data: the algorithm perspective and data perspective. The data perspective technique rebalances the distribution of class on the basis of resampling the space among data by using either undersampling or oversampling cases for majority classes or minority classes. The resampling technique tries to reduce the imbalance dataset issue either deterministically or randomly [16]. Different techniques are recommended to address the imbalanced data such as ADASYN [1620], SMOTE [21, 39, 41], and Random Oversampling [23]. In the experiment, the Adaptive Synthetic Sampling (ADASYN) is used for imbalanced dataset to enhance the performance of classifiers. This technique synthesizes the minority classes as per its distribution in training dataset, which pay attention to more data samples that are tough to learn and small data samples that are easy to learn. The ADASYN approach key is to find the distribution probability used as a criterion to synthesize the number of samples for all minority data samples and finally get the new dataset sample [47]. The algorithms are given in Algorithm 1.

Input: Dataset S1, that contain n samples vi, wi, i = 1, 2, 3…n, vi is a n-dimensional space sample, label is wi 0, 1, wi = 0 shows minority class and wi = 1 shows majority class.
Output: Novel synthesized data samples.
1. Compute imbalance class
2. Compute S = (n1–n2) x ß that is total amount of data samples to be synthesized: where ß is a coefficient.
3. For all of minority data samples note the points of K-nearest neighbor and compute the ratio , = 1, 2, 3 … n, is the number of observations in K-nearest neighbor.
4. Regularization the according to therefore, is the probability distribution .
5. Compute x S which is the total number of synthetic samples that are required for all samples of minority class.
6. The samples are synthesized for all sample of minority class.
7. For m= (1, 2, … , g)
8. Choose the sample v, from K nearest neighbor of v randomly.
9. Suppose represent the range of number [0, 1], for available v produce the synthetic sample according to = + ( - ).
10. Stop algorithm
3.5. Feature Selection (FS)

Feature selection is the important preprocessing step for the classification tasks. The main objective of feature selection is to search the effective subset of feature which displays the raw data at the highest possible degree. The feature selection contributions are threefold: improved learning performance, decreased learning time, and simple model. The same does not happen with the feature subset; thus, the relationship between model performance and feature subset is nonlinear. To overcome these natural challenges, feature selection needs to optimize two different objectives as its focus is to decrease the total amount of features while enhancing the performance of the model [46]. The task of feature selection is defined as

Subject to where .

In the above equation, and show the first objective and second objective, respectively. Regarding these objectives, reduce the number of features and try to enhance the learning classifiers performance. Appropriately, when features subset () is selected from all features (), equals to the total numbers of features in subset and equals to the classifiers accuracy obtained by the testing data after training the selected features only.

By considering these facts for feature selection problem, an ideal solution is to use a single feature that can separate the classes perfectly. Figure 2 [46] represents the sample solution for feature selection. Different sample solutions fs1, fs2, fs3, fs4, fs5, and fs6 are provided. The solutions fs1, fs2, and fs3 dominate the other solutions (fs4, fs5, and fs6) in both objectives; for example, fs1 feature subset solution selects less features and provides the best results as compared to fs4. Solution fs1 dominates solution fs4 and solution fs2, and fs3 also provides better results. Solutions fs1, fs2, and fs3 perform better in different objectives. Therefore, we have found the ideal solution (nondominated) that fits to the Pareto curve.

3.6. Proposed Multiobjective Harris Hawks Optimization Algorithm

The main aim of this research is to use the multiobjective optimization algorithm to search the most different Pareto optimal solution. The multiobjective optimization algorithm is used for optimization with two different objectives: one to reduce the total amount of selected features and the other to maximize the performance of the proposed model. The multiobjective feature selection technique Harris Hawk Optimization helps to find the optimal solution to achieve the desired objectives and increase the classification performance.

The HHO algorithm is recently presented that is motivated by chasing actions of the hawks. It proves to be an efficient metaheuristic technique for identifying the difficult optimization problems like feature selection [46]. The HHO algorithm is proposed by [48] that has a collaborative behavior and chasing style of Harris hawks and pounce on their prey.

Finding the prey, sudden pounce and different attacking plans perform the exploitative and explorative phases of the algorithm [49]. The flock of hawks pounce the prey collectively from different directions and approach the prey. The hunt completes by catching the target or try to go for other approaches. The Harris hawks contain a different set of patterns to follow the target. The strategy is changed when the head hawk bows at the target and the other hawks continue to pounce the target. These collective strategies fatigue the rabbit and enhance the target vulnerability. HHO is the population-based method. It is gradient free and can be implemented on any optimization issue. The HHO metaheuristic uses the exploitation and exploration phases that are motivated through the hawks’ actions while searching the prey, with sudden attacks and diverse pounce plans. This activity of the Harris hawk is designed as the exploitation activity of the artificial hawk in algorithm. Figure 3 represents how the phases of exploitation and exploration replace as per the prey energy level () and activities chance ( and ). The activity detail is given as follows [48]:

3.6.1. Exploration Phase

The Harris hawks have sharp eyes for tracking and detecting their target, but sometimes, it cannot easily find it. So the hawks detect the field to search the prey. Therefore, the hawks perch on a place and observe the prey with two different plans that are randomly used. The hawks perch as per to the location of other hawks and the rabbit when or perch on the random tall tree . There are the equal chances for all strategies. The representation of solution (hawk) for select or not select the features is given in Figure 4 [48].

The HHO method can go from exploration phase to exploitation phase according to the prey escaping energy. With iterations, the rabbit energy () reduce according to the given formula: where is the initial energy of rabbit, is the current iteration, and is the total number of iterations. is the random number initialized at each iteration with (-1, -1). The rabbit becomes strong when value increases by 0 to 1.

The rabbit starts losing its energy when value reduces by 0 to -1, as the number of iteration increases the escaping energy reduces. The HHO is at exploration step when and at exploitation phase when . In short, the exploration and exploitation phase occur when and , respectively [4649].

3.6.2. Exploitation Phase

The prey normally can run away easily from risky conditions. So the hawks apply diverse chasing ways. In the exploitation phase, four different strategies are used according to the plan of hawks. Suppose indicates the chances that the prey can escape or chances of prey that it cannot escape . The soft and hard besiege performs to encircle the rabbit. The hawks encircle the rabbit by different locations according to the energy of rabbit. The hawks together pounce on the prey for increasing the chance to grab the rabbit. The hawks increase the process of besiege to grab the rabbit when the prey starts losing energy. The soft besiege is used when , and the hard besiege is applied when [4649].

3.6.3. Soft Besiege

The rabbit has a good energy level when and , and through some random bounces, the rabbit can escape. At the same time, the Harris hawks quietly encircle the rabbit to make it extremely tired and then execute the sudden pounce actions. The small random number of features from the original dataset, , is created (that represent the movement of the rabbit in nature), and many of the features by the rabbit are copied to the selected hawks [46].

3.6.4. Hard Besiege

The prey has the low level of energy when and and cannot easily run away. The current position of Harris hawk is updated by the given equation: where indicates the difference among the current location of rabbit and iterations of hawk. For hard besiege in the proposed model, for the current hawk, the single feature of rabbit is copied. Figure 5 represents this step.

3.6.5. Soft Besiege with Progressive Rapid Dives

The prey can run away when and , and before sudden pounce, the soft besiege can be applied. To design this sample of the prey in HHO algorithm, use the levy distribution for high-level perturbation. According to the level of energy of rabbit, the features are selected by the given solution that is not similar as the current hawk [46]. The features are selected through greedy selection method that selects the best features at that moment and solve the problems that arise later. By greedy feature selection, the classifiers become more efficient, and the defect prediction model provides more precise results.

3.6.6. Hard Besiege with Progressive Rapid Dives

The hard besiege is used when and , and the prey cannot run away. This case is the same as the soft besiege. The hawks go to reduce the gap with the prey. According to the energy level of the rabbit, from rabbit, different features are selected and they are copied to a hawk randomly chosen from the population. The less number of features are selected to stop the high level of disturbance and make the model stable [46]. The representation of proposed model is given in Figure 6 [46].

The multiobjective Harris Hawks Optimization algorithm is provided in provided in Algorithm 2. The steps performed in algorithms are given in detail. The population size, the maximum number of iteration, and the number of iteration are taken as inputs. The Hawk population is initialized by using Feature Selection Function (Jfs), and calculate the fitness value of hawks by using fitness function. After this search the best location of the target and set this location of the rabbit location Xrb with update the initial energy of rabbit. Update the level of the energy of the rabbit, and then perform the exploration and exploitation phases. Soft besiege and hard besiege are performed with updating the position of hawks. Progressive rapid dives with soft and hard besiege are performed with greedy feature selection that selects the relevant features which increase the performance of the model.

Input: Population size is M, the maximum number of iterations T and the number of the iteration t.
Output: Best solution
  1. The population initialization of Hawks Xi
  2. While
  3. Calculate the fitness value of hawks
  4. Search the best location and set this location as the location of rabbit
  5. for(each hawk (Xi)) do
  6. Update the initial energy
  7. Update the level of energy of rabbit (Equation. (3))
  8. if () then
  9. Perform Exploration
  10. else if (()
  11. Perform the Exploitation
  12. If () then
  13. Perform soft besiege
  14. else if () then
  15. Perform Hard besiege with update the hawk’s position (Equation. (4))
  16. else if () then
  17. Perform Soft besiege with progressive rapid dives with Greedy feature selection
  18. else if () then
  19. Perform Hard besiege with progressive rapid dives with Greedy feature selection
  20. Calculate fitness value of the updated Hawk
  21. Result Best feature subset
  22. Return Result
3.7. Classification Algorithms

To differentiate software defect modules with tender ones, two main classifiers are used which are Support Vector Machine (SVM) and Random Forest (RF). These classification methods are important and mainly used in machine learning (ML) and also manifest the important performance of classification. The main aim of the classifiers is to take out pattern that discloses particular class; all data instance is associated in an available dataset [22].

3.7.1. Random Forest

Random Forest is a supervised classification method that has a collection of trees to create the forest [50]. Random Forest chooses the features randomly for creating the model by using decision tree. For this, it builds different decision trees (random forest) by choosing random variable and data. From the chosen attributes, randomly select the number of instances and allocate to the classification learning algorithm [41]. The random forest algorithm is split into two methods [50].

(1) Random Forest Creation. By using the random sample, all tree is trained and replaced by a training set. The pseudocode of creation of random forest is given below.

Select random “” features from the total number of “” features. Between “” features, by using the finest split point, compute the node “.

By using the best split, divide the node into subnodes.

Repeat the above steps until the number of nodes reached at “1.”

By repeating all the above steps, create the forest for the “” number of times for creating the “” number of trees.

(2) Created Tree Prediction. For training separate trees, the randomly selected features are used to search for dividing. The random allocation decreases the relation between trees that enhance the performance of prediction. The following procedure is used for prediction performance:

Follow the rules of all randomly built decision trees with grasp of the test features.

For all predicted targets, compute the votes.

Consider the final prediction to the predicted target that obtain high votes.

3.7.2. Support Vector Machine

Support Vector Machine (SVM) uses the kernel trick technique for solving the nonlinear separable issue by plotting the points into a high-dimensional area. It solves the overfitting problem innate in learning algorithms. The important aim of SVM is to search the optimal hyperplane among the dataset classes by increasing the gap between the closest points of the classes. The maximum hyperplane gap provides the maximum distance among both classes [46].

3.7.3. Bagging

Bagging is an ensemble approach that enhances the accuracy of ML techniques by integrating prediction of different weak classifiers. It gives the better results for unstable classifiers with small changes in a training set and results in high prediction performance. Bagging predicts the results many times by several training sets which are integrated by voting. For explaining the bagging algorithm, suppose the dataset with instances and the binary label of class. The procedure of bagging algorithm is given [44].

Create the size training set randomly, and replace it with data.

By applying any classification algorithm to train the random training set, allocate the class to all node. Repeat the above steps several times. Use the voting for the prediction of label of class.

3.7.4. Adaboost

Adaboost is the mostly used boosting algorithm that slowly increases the weights of classifier of classification error. Create the new classifier in all iterations to overcome the failure of old classifier, and then associate the created classifier with the voting process together. So the Adaboost essence promotes the weak classifier to strong classifier that is an adaptive lifting technique.

Therefore, the classification error rate reduces as the number of training data increases. The following steps are used in the Adaboost algorithm [51].

Takes the training dataset and all training samples learn through this got the first weak leaning classifier, also provides the maximum number of iterations (M).

The incorrect classification of sample and other data is integrated to represent the new training dataset, and at the same time, adjust the sample weight.

Repeat it to number of times. The new training data samples are created for the next iteration learning classifier that is based on new weight, and finally, the strong classifier is created with better classification effect.

3.7.5. Voting

The voting algorithm is mostly used with learning classifiers combination. The basic idea behind the voting is for classification that uses the diverse probability estimation combinations. In voting, the integrated classifiers vote for the label of class [52].

3.7.6. Stacking

Stacking is the beneficial ensemble machine learning technique. The basic idea behind the stacking ensemble algorithm is as features utilize the confidence score in integrating different models and train the metaclassifier for helping to integrate the prediction of different learning classifiers [53].

3.8. Evaluation Metrics

The prediction and classification problems have different evaluation measurements such as accuracy, specificity, sensitivity, precision, and receiver operating characteristics-Area under the Curve. In this experiment, we evaluate the proposed model based on accuracy and AUC value. The ROC-AUC evaluation method is mostly used in software defect prediction. The AUC value calculation relies on the ratio among false positive verses true positive rates. The AUC value relies on two methods: specificity and sensitivity. The accuracy is the evaluation metrics that distinguish the defective and nondefective part of the software correctly. To estimate the accuracy, calculate the proportion of true positive and true negative in all evaluated cases. The optimization key parameters such as precision, recall, and F1-measure are used to evaluate the performance of the model.

4. Experimental Results

In this section, we evaluated the performance of the proposed model. The proposed model is measured on different parameters. These measurements are executed to check whether the proposed model is better than the existing techniques and to check whether the proposed model is suitable for healthcare application defect prediction or not. The proposed model is applied on healthcare dataset, and six different search techniques are implemented on this dataset. The proposed model reveals the best results as compared to other state-of-the-art techniques. The selected features of the healthcare dataset are given in Table 4.

The Area under the Curve and accuracy are found based on the selected features using RF, SVM as a base classifiers and ensemble classifiers bagging, Adaboost, Voting and Stacking given in Table 5. The obtained results indicate that the best AUC value is 0.992% and 0.957% of RF and Adaboost as a base and ensemble classifiers, respectively. The RF as a base provides the best accuracy of 0.990%, and stacking as an ensemble classifier gives the best accuracy of 0.976%.

In Table 6, the proposed model results are provided without feature selection. It can clearly see that the proposed model does not perform better without feature selection for defect prediction.

In Table 7, the performance of the optimization parameters precision, recall, and F1-measure is provided to evaluate the performance of the proposed optimization feature selection algorithm. Accuracy and AUC parameters are used for comparison.

The comparison of AUC value of different classification algorithms with and without feature selection is provided in Figure 7. It can be seen that RF as a base classifier and Adaboost as an ensemble classifier with feature selection perform better for healthcare software defect prediction. The overall performance of the proposed model is better with feature selection for defect prediction of healthcare software.

In Figure 8, the accuracy comparison of all classifiers is provided and it represents that the proposed approach performs better with feature selection. RF as a base classifier and stacking as an ensemble classifiers with feature selection perform better for healthcare software defect prediction.

4.1. Findings and Discussions

The experiment demonstrates that the proposed model performs best on the healthcare dataset. For checking the performance of the proposed model, two metrics accuracy and Area under the Curve (AUC) are used. The optimization key parameters such as precision, recall, and F1-measure are also used to evaluate the performance of the model. As base classifiers, the RF gives the best AUC result and stacking as a base classifier provides the best accuracy. The SVM and RF give the best precision, recall, and F1-measure results, respectively. The overall performance of the proposed model is better for the healthcare application defect prediction.

4.1.1. RQ 1: What Is the Impact of Multiobjective Feature Selection Method on the Software Defect Prediction Model?

To evaluate the effectiveness of the multiobjective HHO, we must examine the effects of the multiobjective feature selection technique. Using the HHO algorithm, relevant features are selected by using various population sizes (i.e., 5, 10, 15, 20, and 30) and iterations (i.e., 10, 20, 30, 40, and 50). The number of population size and iterations play an important role in performance of prediction model. The proposed model gives the best results when the population size is 10 with 30 iterations, and the worst performance of the model is when the population size is 20 and 30 with 40 and 50 iterations, respectively. To obtain the best results, it is important to tune the parameters of feature selection technique carefully. The dataset variations also impact the feature selection method. Before the experiment, the datasets were highly imbalanced, and after application of HHO on imbalance datasets, it does not provide the relevant features. In this experiment, to get the relevant features and best results, balance the dataset by using adaptive synthetic sampling technique before feature selection. The multiobjective feature selection method surely impacts on software defect prediction model. Without feature selection, the model does not provide the best performance. The results of the proposed model without features selection are provided in Table 6. The results clear that the healthcare defect prediction model provides the worst performance without feature selection. Therefore, in this experiment, the multiobjective feature selection method is used to focus on the two objectives; one is to select the relevant features provided in Table 4, and the other objective is to achieve the best results given in Table 5.

4.1.2. RQ 2: Does the Bagging-Based Ensemble Classifier Impact the Accuracy of Prediction of Software Defects?

The ensemble classifiers Bagging, Adaboost, voting, and stacking are implemented on healthcare dataset to check the accuracy impact on defect prediction on healthcare applications. In Table 5, the accuracy of ensemble classifiers is provided. The best accuracy provides the stacking for healthcare application dataset, and the worst result bagging is provided.

4.2. Treats to Validity

In this experiment, the different threats to validity are explained.

4.2.1. Internal Validity

HHO is used for feature selection, but for the software defect prediction, other feature selection techniques can be used. The performance of the proposed approach can vary if there are different feature selection techniques such as GWO, DF, and GA. In addition to defect prediction, two classifiers (RF and SVM) and four ensemble models (bagging, Adaboost, voting, and stacking) are used in the experiment to achieve the best results. In spite of that, more models such as deep learning techniques can be used.

4.2.2. External Validity

The external threats to validity are minimized by creating a dataset from the data of different healthcare applications. The experiment is executed on health care application dataset, and the proposed approach performs best on the dataset. However if different open source projects and closed source projects are used, then the results can vary.

4.2.3. Construct Validity

The construct threats to validity are the selection technique for HHO to select the subset of features. In the experiment, a greedy selection technique is used to select the best subset of features that provides the best experimental results on the base of selected features. The usage of other selection techniques such as the best first search, random-based tournament, and roulette wheel can impact on the results of proposed approach.

5. Conclusion

Software engineering is used by many software consultants and vendors to develop high-quality healthcare systems such as middle-ware medical devices, patient record management systems, and electronic systems of medical devices. To design the healthcare applications, the software development process models are required to detect the defects in a timely manner. In this paper, an optimized feature selection multiobjective HHO is proposed with two objectives, i.e., minimize the total amount of selected features and increase the performance of the classifiers with the Adaptive Synthetic Sampling method to predict the software defects. The multiobjective HHO works as the wrapper-based feature selection method, and the ADASYN technique is used to increase the quality of datasets. Different machine learning techniques such as RF and SVM and ensemble classifiers such as bagging, Adaboost, voting, and stacking are used for defect prediction. The optimization key parameters such as precision, recall, and f1-measure are used to evaluate the performance of optimization model. After solving the imbalanced issue of dataset, the proposed model enhances the performance of all the algorithms. The RF as a base classifier and Adaboost as an ensemble classifier perform better for healthcare software defect prediction based on AUC while RF and stacking perform better than other classifiers based on accuracy. The proposed model helps to predict the software defects with a significant classification accuracy of 0.990 and an AUC score of 0.992. The obtained results clearly indicate that the proposed model is best for healthcare software defect prediction. In the future, work can be carried with deep learning algorithms CNN and ANN, for checking the model performance with more datasets form open source and closed source projects.

Data Availability

The (Healthcare Dataset for Defect prediction) data used to support the findings of this study have been deposited in the Kaggle repository (https://www.kaggle.com/datasets/iqrayousaf/healthcare-dataset-for-defects-prediction).

Conflicts of Interest

The authors declare that they have no conflicts of interest.