Abstract

Missing data occurs when values of variables in a dataset are not stored. Estimating these missing values is a significant step during the data cleansing phase of a big data management approach. The reason of missing data may be due to nonresponse or omitted entries. If these missing data are not handled properly, this may create inaccurate results during data analysis. Although a traditional method such as maximum likelihood method extrapolates missing values, this paper proposes a bioinspired method based on the behavior of birds, specifically the Kestrel bird. This paper describes the behavior and characteristics of the Kestrel bird, a bioinspired approach, in modeling an algorithm to estimate missing values. The proposed algorithm (KSA) was compared with WSAMP, Firefly, and BAT algorithm. The results were evaluated using the mean of absolute error (MAE). A statistical test (Wilcoxon signed-rank test and Friedman test) was conducted to test the performance of the algorithms. The results of Wilcoxon test indicate that time does not have a significant effect on the performance, and the quality of estimation between the paired algorithms was significant; the results of Friedman test ranked KSA as the best evolutionary algorithm.

1. Introduction

The concept of big data is defined using several characteristics including velocity, volume, and value. The characteristics of velocity are related to how fast incoming data need to be processed and how quickly the receiver of information needs the results from the processing system [1]; the characteristics of volume are related to the amount of data that has to be processed; and the characteristics of value are what the user of big data management will gain from the data analysis. Other characteristics of big data include variety and veracity. The characteristics of variety focus on the different structures that data may take, such as text and images, while the characteristics of veracity focus on authenticity of the data source that is being used for decision-making. These characteristics of big data have resulted in the use of innovative methods for decision-making. These innovative methods may require the combination of different technological platforms for storage, such as Hadoop, NoSQL, and relational databases, to successfully manage this big data. It is possible to have datasets on these platforms with missing values at random, which are a result of mismatched attributes [2] or omitted entries [3]. Hence, missing data is independent of the type of platform on which this data is placed. There are three categories of missing data: data missing completely at random (MCAR), data missing at random (MAR), and data missing not at random (MNAR) [4, 5], which has a different method to handle the missing data.

The missing data category of missing completely at random (MCAR) occurs when the missing values are randomly distributed throughout a matrix such that a missing value in a row of a matrix is not dependent on any other row entry in a dataset [4]. In other words, neither the row entry which is missing nor any other row entry can predict whether a value is missing. When this happens, the chances of the data being detected as missing are not dependent on either the missing or the complete value in the same row entry of a matrix. The listwise method to handle MCAR is ideally used to remove all data that has one or more missing cases; however, by this removal, a problem is created in that the missing values produce both biased parameters and incorrect estimates in analysis. While the pairwise method is also another method to handle MCAR, this method sought to address the missing value problem by computing the covariance estimates from all samples of cases observed on variables. The pairwise deletion method assumes that all data is completely missing at random; therefore, variables with missing data are then deleted during computation. This deletion could cause error in computation because each element in the covariance matrix may have a different group of attributes [6, 7].

Missing at random (MAR) category occurs when the missing value in a row of a matrix depends on another known row entry in a dataset. In other words, the missing value can be predicted from a previously known value in a dataset. Thus, the missing value is dependent on the previously known value. When this happens, it becomes easy to trace a pattern on a missing value in a row of a matrix. The traditional approach to handle MAR is pairwise deletion method as described previously.

Missing not at random (MNAR) (also known as nonignorable nonresponse) category occurs when the missing value in a row of a matrix depends on the other missing values in the row entry. When this happens, the known data cannot be used to estimate the missing value. Thus, the chances that the current value is detected as missing are dependent on the detection of previously missing values.

These traditional approaches to handle missing data such as listwise deletion or case deletion, pairwise deletion, and also sample mean substitution (i.e., -NN and -means clustering [4, 8]) are, however, not efficient at providing the best optimal estimates for missing values. For instance, the sample mean substitution method requires that each data point clustered around a centroid be computed so as to find the best estimates. Thus, the number of clusters, the number of data points, and the dimensions involved to compute missing values make it inefficient. With the pairwise deletion, since the method assumes all data is missing at random, it then uses the average sample size to estimate its standard error, which results in either underestimation or overestimation of the standard error in the analysis of missing values, and this makes it inefficient. In view of this, other efficient methods such as maximum likelihood [9] and multiple imputation method (for MAR), expectation-maximization (EM) algorithm [4, 10] and machine learning approach (such as autoencoder neural network) and metaheuristic algorithms, such as genetic algorithms, [11] have been proposed to handle missing values at random.

The maximum likelihood method is a method for estimating missing values by selecting a set of parameters or values that maximizes a likelihood function. The advantage of the maximum likelihood method is its consistency and unbiased estimation of the parameter closest to the observed value [12]. The expectation-maximization algorithm uses the maximum likelihood method to impute all missing values in a dataset [4]. This procedure in finding missing values uses probability to iteratively impute values in its estimation of an approximate parameter that becomes closer to the missing value [10]. This iterative process generates a weighted value that is improved in each iteration until a termination condition is reached. Additionally, when there are many variables and multiple missing values, then the computational time increases at each iteration. On the other hand, the autoencoder neural network or the autoassociative multilayer perceptron method consists of an input and output layer where the number of inputs is equal to the number of outputs [13]. This network, when used to estimate approximate missing values, uses activation functions to map sparse input space through hidden units to output space. In other words, this activation function is used as a function to control the input data from a dataset. During the estimation process, weighted parameters are used in the hidden unit of the network. This weight parameter is solved iteratively by maximizing the probability of the weight parameter in the hidden unit to produce the best value that is close to the missing value [5]. The advantage of the autoencoder neural network is that it gives reliable estimates as missing values; however, its efficiency depends on the number of hidden layers chosen, and the higher the number of hidden layers, the more the computational time required to estimate the missing value(s).

Genetic algorithms are an evolutionary approach which is based on the survival of the fittest. This survival depends on the mechanism of “natural selection” (Darwin, 1868, as cited in [14]) where element is represented using a binary string. A genetic algorithm is an adaptive search procedure [15], as cited by [14], which involves the use of operators such as crossover, mutation, and selection methods to find a global optimal result/solution. The search procedure starts with an initial guess and attempts to improve the guess through evolution [14] by comparing the fitness of the initial generation of population with the fitness obtained after application of operators to the current population until the final optimal value is produced. This adaptive search procedure is an iterative process that allows the elimination of weak individuals of a population through a continuous update of the initial generation of population via multiple generations until the termination condition is reached. The adaptive search procedure helps to find approximate missing values [16] by optimizing an objective function/fitness function in any given search problem.

Particle swarm is a bioinspired method that is based on the swarm behavior of fish schools and bird flocks in nature [17]. The swarm behavior is expressed in terms of how particles adapt and make decisions on change of position within a space based on the position of other neighboring particles. The advantage of swarm behavior is that as an individual particle makes a decision, it leads to an emergent behavior [18]. This emergent behavior is the result of local interaction among particles in a problem space. Among the particle swarm algorithms for finding the best possible solutions in a problem space are the Firefly algorithm [19], bats [20], and cuckoo birds [21]. The successful characteristic of fireflies is the short and rhythmic flashes they produce [19]. This flashing light is used as a mechanism to attract mating partners and attract potential prey, and it serves as a warning to other fireflies. The signaling system of this flashing light mechanism is controlled by simplified basic rules underlining the behavior of fireflies. Unlike a genetic algorithm which uses operators such as mutation, crossover, and selection, the firefly uses attractiveness and brightness to improve certain individuals in its population. The similarity between the genetic algorithm and the firefly is that both generate an initial population and continue to update the initial population using fitness functions. The brighter fireflies attract those closest around them and the fireflies whose flashes fall below a given threshold are removed from the population, while the brightest fireflies form the next generation, and the generations/iterations continue until a select criterion is reached or the maximum number of generations is reached. The behavior of fireflies where a bright firefly attracts a firefly with a weaker brightness has been applied in missing data imputation by finding estimates of values closest to known values and then replacing these missing values with these estimates.

Wolf Search Algorithm (WSA) is a bioinspired heuristic optimization algorithm which is based on wolf preying behavior [22]. The behavior of a wolf includes its ability to hunt independently by remembering its own trait (meaning wolves have memory); ability to only merge with its peer when the peer is in a better position (meaning there is trust among wolves to never prey on each other); ability to escape randomly upon appearance of a hunter [22]; and the use of scent marks as a way of demarcating its territory and communicating with other wolves [23]. This behavior expressed by wolves enables them to randomly adapt to their environment when hunting. If a wolf finds a new better position, the incentive is stronger to assume this new position provided that the position is already inhabited by a companion wolf. The wolf search algorithm is an iterative search process that starts with the setting of the initial parameter, random initialization of population, evaluation and updating a current population using a fitness test, and continuing on with creating new generations/iterations until some stopping criterion is met. Unlike the genetic algorithm that uses operators such as mutation, crossover, and selection methods or particle swarm algorithm, such as firefly, that uses attractiveness and brightness of prey, the wolf uses attractiveness of prey within its visual range. Furthermore, each wolf instinctively flocks together in a pack, which is collective, and organizes individual searches of an individual wolf. Therefore, the swarming behavior of WSA is delegated to each individual wolf and this could form multiple leaders swarming from multiple directions towards the best solution rather than a single flock searching for an optimum in one direction at a time [22]. This swarm behavior of wolves could be used to estimate the approximate value close to known values in a missing value at a random situation.

A variation of WSA is the Wolf Search Algorithm with Step Minus Previous (WSAMP). This WSAMP allows wolves to remember a previous best position and avoid the old positions which do not produce the best solution.

BAT algorithm [24] is a bioinspired method based on the behavior of microbats in their natural environment. The unique behavior that characterizes bats is their echolocation mechanism. This mechanism helps bats orient and find prey within their environment. The search strategy of bats is controlled by the pulse rate and loudness of their echolocation mechanism. The change in the pulse rate helps to improve on the previous position, while the loudness alerts each other bat on the best position that has been found [25]. The bat behavior has been applied in several optimization problems to find the best optimal solution. The BAT algorithm search process starts with random initialization of the population, evaluation of the new population using a fitness function, and finding the best population. Unlike the wolf algorithm that uses attractiveness of prey to govern its search, the BAT algorithm uses the pulse rate and loudness to control the search for the optimal solution.

Bioinspired search strategies are controlled by randomization, efficient local search, and global best solution [24]. The contribution of this paper is that the random encircling behavior of certain birds that is required in achieving an optimal solution for missing values is first proposed as a new computational method and then examined in comparison with other metaheuristic algorithms such as Wolf Search Algorithm with a step Minus Previous (WSAMP), Firefly algorithm, and BAT algorithm. The advantage of random encircling is that it maximizes the search space, thus creating a wider range from a hovering position for the best possible solution. We also evaluated the quality of the proposed computational method, the random encircling of birds such as Kestrel, using a fitness function.

The remainder of this article is organized as follows. In Section 2, we describe the behavior of Kestrel birds. Section 3 discusses the proposed computational model. The model consists of mathematical formulations on the Kestrel’s characteristics. Section 4 discusses the experimental results as well as comparisons of the proposed algorithm with the existing approach. Section 5 presents statistical analysis of experimental results. Section 6 contains conclusions and future work.

2. Description of the Behavior of Kestrel Birds

The bioinspired algorithm is based on the behavior of Kestrel birds when hunting for prey. The Kestrel is a kind of bird that hunts by hovering (i.e., flight-hunt) or from a perch. These birds are strongly territorial and hunt individually [26, 27]. Reference [27] indicated that, during a hunt, Kestrels are imitative rather than cooperative. This suggests that Kestrels prefer not to communicate with each other but rather they imitate the behavior of other Kestrels with better hunting techniques and improve their hunting technique even though the hunting technique can change based on the type of prey, prevailing weather conditions, and energy requirements (for gliding or diving) [28].

During hunting, Kestrels use their eyesight to watch small and agile prey within their circling radius or coverage area referred to as the visual circling radius. The minute air disturbance from flying prey and the trail of urine and faeces from ground prey give an indication of the availability of prey. Once available prey is detected using these indications, the Kestrel positions itself to hunt. Kestrels are able to hover in changing airstream, maintain a fixed forward looking position with their eyes on the prey, and use random bobbing of the head to find the least distance between their position and the position of the prey. Also, the Kestrels possess excellent ultraviolet sensitive eyesight characteristics to visually locate trails because these trails of urine and faeces reflect ultraviolet light. Consequently, trails of prey such as voles become visible to Kestrels [29].

In hovering, Kestrels perform a wider search (global exploration) across territories within their visual circling radius, maintain a motionless position with a forward looking eye fixed on the prey, detect minute air disturbances from flying prey (particularly flying insects) to best position themselves to hunt the prey, and mostly move with precision through changing airstream.

Kestrels are able to flap their wings and adjust their long tails to stay in a place that is referred to as a still position in changing airstream. While in perch, mostly from high fixed structures, the Kestrel changes its perch every few minutes, performs a thorough search (a local exploitation using its individual hunt behavior) of its local territory with less energy requirements than a hovering hunt, and uses its ultraviolet sensitive capabilities to detect mammals such as voles closer to a perched area. This behavior suggests that, in perch, Kestrels conserve some of their energy and direct their ultraviolet sensitive capabilities to detect slowly moving prey on the ground. Moreover, an individual Kestrel with better perch and hovering skills in wider search area stands a better chance to move faster on its prey or disperse sooner from its enemy than individual Kestrels that develop hunting skills in local territories [27]. Therefore, it is significant to combine both types of hunting skills for a successful hunt. The characteristics of Kestrels are summarized as follows:(1)Soaring: this gives a larger search space (global exploration) within the visual coverage area.(a)They maintain a still (motionless) position with eyesight fixed on the prey.(b)They encircle the prey beneath with keen eyesight.(2)Perching: each Kestrel does a thorough search (local exploitation) within its visual coverage area.(a)They perform frequent bobbing of the head.(b)They get attracted to the prey using the detected visible trail and then glide to capture this prey.

The following assumptions are made on the characteristics:(i)The still position gives a near-perfect circle, and thus frequent changes in a circle direction depend on the position of the prey in shifting the center of its circling direction.(ii)Frequent bobbing of the head gives a degree of magnified or binocular vision that helps in measuring the distance to the prey, which then enables the Kestrel to move with a speed to strike.(iii)Attractiveness is proportional to light reflection; thus, the higher or the longer the distance from the Kestrel to the trail, the less the trail brightness. This distance rule applies to both hovering height and distance away from perch.(iv)New trails are more attractive than an old trail. Thus, trail decay or trail evaporation depends on the half-life of the trail.

3. The Proposed Computational Model

The proposed computational model for Kestrel’s missing value estimation is based on the description of Kestrels’ behavior and characteristics. The following mathematical expressions depict the characteristics of the Kestrel.

(i) Encircling. Encircling is when the Kestrel randomly shifts (or changes) the center of circling direction to recognize the current position of the prey. As the prey changes its current position, Kestrels randomly use the encircling behavior to encircle the prey. This movement of the prey determines the best possible position assumed by the Kestrel. The encircling [30] is expressed asThus, where is the coefficient vector, is the encircling value obtained to indicate best position, is the position vector of the prey, is coefficient vector, and indicates the position vector of a Kestrel, and and are random numbers generated between 0 and 1.

(ii) Current Position. The current best position of the Kestrel is expressed as Thus,where is the coefficient vector, is the encircling value obtained, is the position vector of the prey, and represents the current best position of Kestrels. linearly decreases from 2 (upper bound value) to 0 (lower bound value) and it is used to control the randomness in iteration. is expressed as follows:where itr is the current iteration and Max_itr represents the maximum number of iterations to terminate the search. represents the higher bound value while represents the lower bound value. Other Kestrels that are involved in the search update their position according to the best position of the leading Kestrel. Also, the change in position of a Kestrel in airstream depends on the frequency of bobbing, attractiveness, and trail evaporation. This is expressed as follows.

(a) Frequency of Bobbing. The frequency of bobbing is used for sight distance measurement in the search space. This is expressed as where is a random number generated from lower and upper end points to control the frequency of bobbing within a visual range. represents the maximum frequency and is the minimum frequency both between 1 and 0, respectively.

(b) Attractiveness. Attractiveness indicates the light reflected from a trail, which is defined bywhere represents the attractiveness, represents variation of light intensity in the range , and represents the sight distance measurement which is expressed using Minkowski distance formulation as follows:Thus,where is the current sight measurement, are all potential neighboring sight measurements near , is the total number of neighboring sights, is the order (1 or 2), and is the visual range.

(c) Trail Evaporation. A definition of a trail is the formation and maintenance of a line [31]. In metaheuristic algorithms, ants use trails both to trace the path to a food source and to prevent themselves from getting stuck in a single food source.Thus, ants, using these trails, can search many food sources in a search space [14]. As ants continue to search, trails are drawn and substances are deposited in the trail. These substances help ants to communicate with each other about the location of food sources. Therefore, other ants continuously follow this path and also deposit substances for the trail to remain fresh. Similar to ants, Kestrels use trails in search of food sources. However, these trails are rather deposited by prey, which provides an indication to Kestrels on the availability of food sources. The assumption is that the substances deposited by these types of prey are similar to substances deposited on ants’ trails. Additionally, when the source of food depletes, Kestrels no longer follow this path. Consequently, the trail substance begins to diminish with time at an exponential rate causing trails to become old. This diminishment denotes the unstable nature of the trail substances which can be theoretically stated as follows: if there are unstable elements with an exponential decay rate γ, then an equation can be formulated to describe how substance decreases in time [32]. This equation is expressed as follows:In other words, since the substances are unstable, this introduces randomness in the decay process. Thus, the decay rate () with time () is reexpressed aswhere is a random initial value of substance that is decreased at each iteration and where is the number of iterations or time steps. , where is the maximum number of iterations. The decay rate at time to indicate a new trail or old trail is expressed asAgain, the decay constant λ is expressed bywhere is the decay constant, is the maximum number of substances in the trail, is the minimum number of substances in the trail, and is the half-life period of a trail which shows that the trail is old and unattractive.

Finally, the Kestrel updates its position using the following equation:where is the current best position of the Kestrel which represents the candidate solution and is the previous position of the Kestrel. represents the attractiveness as expressed in (7). represents a Kestrel with a better position while is the frequency of bobbing as expressed in (6).

(iii) Fitness Function. The fitness function is used to evaluate how well the algorithm performs in terms of the quality of estimation. This performance is measured in terms of minimizing the deviation of data points from the estimated value without considering the direction (negative or positive) of the fitness value. Thus, the performance measurement method used the mean of absolute error (MAE) as fitness function evaluation because it allows the model to fine-tune absolute values and improve on the performance of values into much finer positive values without consideration of negative values. The MAE is expressed in where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the dataset, and is the number of data points in the sampled dataset. There are other evaluation methods such as root mean square error (RMSE) and mean square error (MSE).

The root mean square error (RMSE) measures the mean square error in the original data point and estimated value. The RMSE is expressed as the square root of the variance (i.e., standard deviation) in where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the sampled dataset, and is the number of data points in the sampled dataset.

The mean square error (MSE) measures the square of the deviation between the estimated values and the actual data point for the variable being considered; the smaller the MSE value, the better the accuracy of estimation, and vice versa. The MSE is expressed in where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the sampled dataset, and is the number of data points in the sampled dataset.

The difference between the RMSE and the MSE is that MSE minimizes the error between the observed data and estimated value, but RMSE further minimizes the variance, while the mean of absolute error (MAE) measures the magnitude of errors without considering the direction of the fitness value.

In our comparison, we expressed the fitness function using the mean of absolute error as follows: where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the dataset, and is the number of data points.

(iv) Velocity. The velocity of a Kestrel moving from its current best position in a changing airstream iswhere is the current best velocity, represents the initial velocity, and represents the best position of the Kestrel. The change in velocity is controlled by the inertia weight (which is also referred to as the convergent parameter). This inertia weight has a linearly decreasing value. The final velocity is thus expressed to include this inertia weight as expressed in where is the convergence parameter, is the initial velocity, is the best position of the Kestrel, and is the current best velocity of the Kestrel. A Kestrel search through the search space in order to find an optimal solution requires the continuous update of the velocity, random encircling, and position towards the best estimate.

The proposed algorithm to implement KSA is expressed in Algorithm 1.

(i) Set parameters
(ii) Initialize population of Kestrels using equation (3) and evaluate fitness of population using equation (18)
(iii) Start iteration (loop until termination criteria is met)
Compute Half-life of trail using equation (11)
Compute frequency of bobbing using equation (6)
Evaluate position for each Kestrel as in equation using equation (14)
If then
Move Kestrel towards
End if
Update position for all to as in equation (20)
Find the current best value
(iv) End loop

The Kestrel formulation also adopts the aspect of swarm behavior in terms of individual searching, moving to a better position, and fitness evaluation. However, what makes the Kestrel distinctive is the individual hunt through its random encircling of prey and its imitation of the best individual Kestrel. Since Kestrels hunt individually and imitate the best features of successful individual Kestrels, it is suggested that Kestrels are able to remember the best solution from a particular search space and continue to improve upon the initial solution until the final best is reached.

In comparison of the unique characteristics of the Kestrel algorithm with the Firefly, the wolf, and the BAT algorithms, the following can be stated: the Firefly algorithm is based on attractiveness, collective behavior, and brightness; the wolf algorithm is also based on attractiveness, collective behavior, and escape; the BAT algorithm is based on pulse rate and loudness; and the Kestrel algorithm is based on attractiveness, brightness of the trail which is dependent on the half-life period, and encircling. This encircling behavior allows Kestrels to be adaptable in searching multiple missing values within a particular search space. The basis for the comparison is to assess the interesting behavior of the newly developed algorithm (i.e., KSA) and show how different this newly developed algorithm is from previous algorithms.

4. Experimental Results

The proposed algorithm was implemented in MATLAB 2012A and the quality of estimation was evaluated with the MAE method.

The initial parameters for KSA were set as and visual range = 1. As expressed in (5), the following parameters were set for the lower and higher bound as and , respectively. Representative data was used to test our algorithm and a maximum of 500 iterations/generations were done to have a greater chance to further refine the best value in each run. A sample set of data ( matrix problem dimension/scale) with multiple missing values in the row matrix was used in order to provide a thorough test of missing values in each row of a matrix. Figure 1 shows the test results.

Figure 1 depicts the single graph on the fitness value of KSA after 500 iterations. The curve ascends and descends steeply during the start of iteration and then gradually converges at the optimal solution at the end of the iterations. The nature of the curve at the initial iteration suggests that KSA quickly maximizes the search space and gradually minimizes until its convergence to a global optimum value of 0.007421.

Figure 2 depicts the comparison of fitness value of KSA as expressed in (18). During the comparison process, 500 iterations/generations were performed. The basis of this comparison is to demonstrate how the proposed algorithm performs in larger generations and thus allow the adequate refinement of the best fitness value.

The Wolf Search Algorithm with a step Minus Previous (WSAMP) [22] was used as a comparative algorithm because it allows wolves to have a memory of the previous best position that has been visited; hence, the old positions are avoided when a new position is being generated during the search, where Minus Previous means wolves can only remember up to a previous visited step. In WSAMP, the randomness () parameter was set to 0.2 while escape from the local minimum was also set to 0.25. Figure 2 shows the comparison of fitness evaluation of KSA and WSAMP algorithm both using MAE as a fitness function.

Figure 2 shows the curve on comparison of the fitness evaluation of KSA with WSAMP in 500 iterations/generations. The fitness curve gradually slopes down and maintains constant fitness, indicating quick convergence at the start of iteration. Table 1 indicates the fitness values of the curve on both KSA and WSAMP.

Table 1 shows the comparative results on KSA with WSAMP. The resultant fitness values showed WSAMP having a minimum fitness value of as compared with KSA which has a fitness value of . In several iterations that were performed, WSAMP maintained the minimum fitness values as compared to KSA. The results showed that WSAMP outperformed KSA in terms of the minimum error because WSAMP was able to remember the best position previously visited.

In BAT algorithm, both the loudness and the pulse rate were set to 0.5 without fine-tuning these parameters. Also, the arbitrary frequency range was set to a minimum of 0.2 and a maximum of 0.9. This frequency range determines the frequency scaling of a bat. The BAT algorithm was compared with KSA and the comparative curve of the fitness value is illustrated in Figure 3.

Figure 3 shows the curve on comparison of the fitness evaluation of KSA with BAT algorithm in 500 iterations/generations. The fitness curve for KSA peaks at the initial iteration and gradually slopes down and maintains a constant fitness value, thus indicating convergence. Table 2 indicates the fitness values of KSA and BAT algorithm.

Table 2 shows the comparative results on KSA with BAT algorithm. The resultant fitness values show KSA having a fitness value of 0.0029716 while the BAT algorithm has a fitness value of 3.0326. The BAT algorithm, however, showed a horizontal line from the initial iteration to the end of the iterations. This suggests that the BAT algorithm was not able to converge to a global minimum. Thus, the BAT algorithm is used for estimating missing values at random results in a high error of estimation. Thus, KSA outperformed the BAT algorithm in finding the optimal value.

In the Firefly algorithm, the randomness (σ) and absorption coefficient () were set to 0.2 and 1.0, respectively. This setting allowed a small interval between the random numbers being generated. However, randomness reduction was set to 0.97 (similar to an annealing schedule).

Figure 4 shows the comparison of KSA with the Firefly algorithm. The curve indicates that KSA converges to a global minimum after the end of the iterations, while the curve on the Firefly algorithm shows several local minimum curves during the start of iteration, and then the curve smooths until the final iteration, suggesting that the curve moves from a local minimum and then gradually lessens to a global minimum. Table 3 indicates the fitness values of both KSA and Firefly algorithm.

Table 3 shows the comparative results on KSA with the Firefly algorithm. The fitness value of KSA converges to a value of 0.0054204 while the Firefly algorithm converges to a fitness value of 1.0000. This suggests that KSA produces the minimum error when estimating missing values as compared with the Firefly algorithm.

The comparative curve of the fitness value is illustrated in Figure 5.

Figure 5 shows the curve between BAT algorithm and WSAMP; while the curve on BAT algorithm is constant, the curve on WSAMP gradually slopes until convergence. Table 4 shows the values of both BAT algorithm and WSAMP.

Table 3 shows the comparative results on BAT algorithm with the WSAMP algorithm. The fitness value of WSAMP converges to a value of 1.4864e − 7 while the BAT algorithm was constant at 3.1703. This suggests that WSAMP produces the minimum error in estimating values that are missing as compared with BAT algorithm.

Figure 6 shows the curve between BAT algorithm and Firefly algorithm. The curve on the Firefly algorithm shows several local minimum curves from the start of iteration to approximately the 150th iteration; thereafter, the curve maintained a constant horizontal line until the final iteration, suggesting a global minimum; the curve on BAT algorithm has a constant horizontal line from the start of iteration to the end of iteration. Table 5 shows the result of comparison between BAT algorithm and Firefly algorithm.

The fitness value of the Firefly algorithm converges to 1.000 while the BAT algorithm was constant at 3.1703. This suggests that the Firefly algorithm produces the minimum error when estimating missing values. Thus, the Firefly algorithm outperforms BAT algorithm in terms of minimum error.

Figures 7 and 8 show the curve between WSAMP algorithm and Firefly algorithm, respectively, so as to show a clear figure on the nature of the curve.

Although both algorithms converged to a minimum, the fitness value of WSAMP converges to a value of , while the curve on the Firefly algorithm shows several local minimum curves and gradually lessens to a global minimum.

Table 6 shows the comparative results on WSAMP with the Firefly algorithm. While the fitness value of the Firefly algorithm converges to 1.000, the WSAMP algorithm converges at . This suggests that WSAMP algorithm produces the minimum error. Therefore, the WSAMP algorithm outperforms the Firefly algorithm when estimating missing values. Table 7 shows a summary of the results on mean absolute errors obtained from each paired algorithm in the sample matrix problem dimension.

Table 7 shows a summary of the results where each column represents the MAE obtained from each iteration in the matrix. The results indicate that BAT algorithm performed poorly as compared to WSAMP and Firefly algorithm.

Different problem scale/dimension of the dataset was applied to each algorithm and the corresponding fitness value (i.e., MAE) was computed. The dimensions that were selected help to observe the behavior of each algorithm on different problem scales. Table 8 shows the results on MAE values obtained from the comparative algorithms on different problem dimensions.

Table 8 indicates that KSA obtained optimal results as compared with BAT algorithm and Firefly algorithm in different problem scales. The results suggest that KSA is the best for random encircling and this algorithm is one of the best/optimal in estimating multiple missing values in any big data analysis environment.

5. Statistical Analysis of Experimental Results

The basis for the statistical analysis of experimental results on the comparative algorithms is to find the significance of results obtained from each algorithm. In order to achieve this comparison in an accurate manner, the study used a profile on all the test functions used in each of the algorithms and the MAE results (i.e., the quality of estimation) in Table 8. The nonparametric statistical procedure was used to analyze the significance of the comparison. This nonparametric statistical procedure was used as it does not make underlying assumption on parameters such as mean and variance of the algorithm being assessed. In contrast, parametric statistical procedures make assumptions on parameters that are being assessed. In this section, the profiling of test functions and the nonparametric statistical procedure adopted for the analysis were discussed. This section is structured into two: statistical analysis on profiling of test functions and statistical analysis on MAE results (quality of estimation).

(i) Statistical Analysis on Profiling of Test Functions. Profiling is a technique used to measure time spent on aspects of a program such as a function [33]. This technique helps to optimize functions and improve on the performance of the algorithms. During profiling, the following are considered: function name, the number of times a function was called upon (i.e., calls), the total time spent on each function including subfunctions (total_time), and the total time spent on a function excluding the time spent on subfunctions (i.e., self_time). It is possible for functions that are less time intensive to call other functions that are time intensive. The profiling technique is important as it determines which functions are responsible for calling other functions.

The study applied profiling as a technique to extract functions and to group the functions into two categories, namely, major functions and basic functions. While the major functions are functions that were written to implement the behavior of the algorithms, the basic functions are the in-built functions that work alongside the major function. Table 9 indicates how the functions are grouped during profiling on each comparative algorithm.

Table 9 shows the different test functions in each comparative algorithm. The WSAMP has 2 major functions categorized into the main function (represented by f1) and a subfunction (represented by f2). All main functions are represented in each algorithm by f1. The Firefly algorithm has 6 major functions (one main function f1 and 5 subfunctions). The BAT algorithm consists of 3 major functions (one main function f1 and 2 subfunctions), while KSA consists of 4 major functions (one main function f1 and 3 subfunctions).

In order to have a true reflection on the nature of in-built functions that were extracted, all in-built functions were considered for analysis. It is observed that when some in-built functions were called, the total_time is zero seconds, thus making those in-built function calls inconsequential in terms of execution time. However, these inconsequential in-built functions were taken into consideration so as not to lose track of any function calls made.

A statistical procedure was applied to analyze the significance of profile results obtained in Table 9. The study conducted a nonparametric statistical test to assess which of the algorithms has better performance in terms of the behavior of test function call time. The basis for the statistical analysis is find out the significance of the profiled results from each algorithm. Reference [34] indicated that nonparametric or distribution-free statistical procedures help to perform pairwise comparison on related algorithms even in the case where the sample size of a dataset is small such as where sample size . When there are multiple comparisons of algorithms, the Wilcoxon signed-rank test helps to rank algorithms and test how significantly the algorithms outperform each other [34]. In this article, the mean of the time to call test functions of each algorithm was used in order to suggest that two algorithms are equivalent. In order to indicate the probability of error in the median of the two algorithms, the p value was used [35]. The advantage of Wilcoxon test is that there is no need to make an assumption about the population of functions being used since the Wilcoxon test can guarantee about 95% (i.e., 0.05 level of significance) of efficiency if the population is normally distributed, meaning that if there are 500 observations on test function calls, then the Wilcoxon signed-rank test is efficient to about 499 observations on test function calls. Reference [36] indicated that the Wilcoxon signed-rank test is analogous to the related sample -test; however, the -test is unsuitable for this type of analysis, while Wilcoxon is suitable. Samples are related if one sample matches the other sample, while the rank is a number assigned to an individual sample according to its order in a list of algorithms. Thus, the Wilcoxon statistical technique helps to assign ranks to algorithms in order to identify the best ranked evolutionary algorithms behavior [37] and to determine the significance of each algorithm. The following steps are applied in computing the Wilcoxon signed-rank test.

Step 1. Compute the difference of paired samples in each algorithm. Any pairs with a difference of 0 are discarded.

Step 2. Find the absolute .

Step 3. Compute the rank of signs ( difference and difference) from the lowest to the highest.
The sum of ranks is expressed bywhere is the sample size.

Step 4. Compute the test statistic . Thus, . Thus, the test statistic is the smallest value.

Step 5. Find the critical values based on the sample size . If is less than or equal to the critical value at a level of significance (i.e., ), then a decision is made that algorithms are significantly different [34]. In order to accomplish this, the Wilcoxon signed-rank table is consulted, using the critical value () and sample size as parameters, to obtain the value within the table. If this value is less than the calculated value of the algorithmic comparison, this means that the algorithmic difference is significant.

In order to apply the Wilcoxon signed-rank test, an analysis was performed on the time to call a function name (both in-built functions and major functions) as follows.

(a) Time Analysis of the In-Built Function Calls. The performance of the comparative algorithms was based on test function call time differences between the self_time and total_time of in-built functions in each algorithm. Based on the steps in computing Wilcoxon signed-rank test, the time for the test function call per each algorithm is shown in Table 10.

From Table 10, represents the difference between the sum of total_time and the sum of self_time. It is observed that the sum of the signed positive ranks is 10 while the sum of negative ranks is 0. Since the sample size is less than 30 and the Wilcoxon signed-rank table shows that there is no critical region on the subfunction at , the Wilcoxon signed-rank table suggests that in-built functions are equivalent. In terms of ranking of algorithms, the WSAMP was ranked first while KSA was ranked second.

Since is small, it is tedious to find a critical value for small values of . Table 11 shows the mean and standard deviation on the sum of self_time and sum of total_time on the subfunction.

Table 11 shows sample size () and the mean on sum of self_time as 12.4398 and sum of total_time as 40.6387 with their corresponding standard deviation. Since the results show a standard deviation of 80.06121 on total_time, it suggests a high deviation of total_time on the in-built function calls as compared with low standard deviation of 24.52552 on self_time in-built function calls. Thus, there is a high total_time spent on in-built function calls in the algorithms as compared with self_time on in-built function calls, meaning that the algorithms spent an intensive amount of time in calling an average of 40.6387 in-built functions and an average time of 12.4398 in excluding other in-built function calls. Table 12 illustrates the Wilcoxon signed-rank test between total_time and self_time of in-built function calls computed using the Statistical Package for the Social Sciences (SPSS). Table 12 contains the sample size (), mean rank, and sum of ranks.

Table 12 shows Wilcoxon signed ranks on the comparison of sum of total_time and sum of self_time. There were 4 samples between total_time and self_time. The basis is to find whether the differences between total_time and self_time are significantly different from zero and whether the differences that were observed in the mean rank (0.00 versus 2.50) can be located in the population of in-built function calls. In order to locate the value between the mean ranks (0.00 versus 2.50), the test of significance of time on performance on in-built functions is computed as in Table 13.

Table 13 shows the test statistic that was obtained. The asymptotic sig. (2-tailed) in the table represents the value for the test, while the Wilcoxon signed-rank test was computed using the statistic. Thus, the Wilcoxon signed-rank test was used on 4 samples to find out whether there is a significant change of total_time with self_time showing and . Since within the mean rank (0.00 versus 2.50), the value of 0.068 indicates that, statistically, time did not result in a significant change in performance of in-built function calls.

(b) Time Analysis on the Major Function Calls. In this study, we also conducted a statistical analysis to determine whether time might be significant in enhancing the performance of major function calls of each algorithm. Table 14 shows the summary of the Wilcoxon signed-rank test on major functions between self_time and total_time of each algorithm.

Table 14 shows the difference between the sum of total_time and the sum of self_time of each algorithm. It is observed that the sum of signed positive ranks is 4 while the sum of negative ranks is 0. Since the sample size is less than 30 and from the Wilcoxon signed-rank table, it shows that there is no critical region, on the significance level of , on the Wilcoxon signed-rank table, to suggest that the major functions are significantly different in terms of total_time and self_time of calling the major functions. All the major functions of each algorithm were ranked equally. The Wilcoxon signed-rank test was conducted to test whether there was a significant difference. Firstly, Table 15 indicates the mean and standard deviation on both the sum of self_time and the sum of total_time on the major functions.

Table 15 shows sample size () and the mean on sum of self_time as 11.5742 and sum of total_time as 16.7973 with their corresponding standard deviation. The results indicate a standard deviation for total_time as 22.00520 and self_time as 13.59744. Thus, there is a high variation in total_time as compared with low variation in self_time of major function calls. Table 16 illustrates the Wilcoxon signed-rank test between total_time and self_time of major function calls.

Table 16 shows Wilcoxon signed ranks on the comparison of sum of total_time and sum of self_time. There were 4 samples between total_time and self_time. The basis of this comparison is to find out whether the differences between total_time and self_time are significantly different and whether the differences that were observed in the mean rank (0.00 versus 2.50) can be located in the population of major function calls. In order to locate the value between the mean ranks (0.00 versus 2.50), the test of significance of time on performance is computed in Table 17.

Table 17 shows the test statistic that was obtained. The asymptotic sig. (2-tailed) in the table represents the value for the test, while the Wilcoxon signed-rank test was computed using statistic. The Wilcoxon signed-rank test is used to find whether there is a significant change in total_time and self_time at and . The results indicate that, statistically, time did not result in a significant change in performance of major function calls.

(ii) Statistical Analysis on Output Results on Quality of Estimation. Wilcoxon signed-rank test was conducted on all the dimensions of results on quality of estimation in Table 18.

Table 18 consists of all problem dimensions of each algorithm and the respective MAE. Although the null hypotheses were formulated based on dimension, the study tested the hypotheses on all problem dimensions of the MAE value. Earlier, the analysis on performance of each paired algorithm was made as follows:(1)WSAMP outperformed KSA in terms of the minimum error (MAE).(2)KSA outperformed BAT algorithm in finding the optimal value.(3)KSA produces the minimum error when estimating missing values as compared with the Firefly algorithm.(4)WSAMP produces the minimum error when estimating missing values as compared with the BAT algorithm.(5)The Firefly algorithm outperforms BAT algorithm in terms of the minimum error.(6)The WSAMP algorithm outperforms the Firefly algorithm based on their MAE.

In order to test the significance of quality of estimation, the Wilcoxon test statistic was computed using SPSS and the value is shown in Table 19.

Table 19 shows the value that was obtained from each comparing algorithm. Since the values must be less than or equal to the level of significance of 0.05 in order to be significant, the results in Table 19 show that the quality of estimation was significant between the paired algorithms. In this case, the WSAMP significantly outperformed KSA in terms of the MAE.

Multiple Comparison of Output Results on Accuracy. Reference [38] indicated that the Wilcoxon signed-rank test is best used for pairwise comparisons between two algorithms. In a multiple comparison situation where two or more algorithms are compared, it is possible for errors to accumulate such that the performance of algorithms is significant. Reference [34] indicated that performing multiple comparison enables correcting the Family-Wise Error Rate (FWER) which occurs after multiple algorithms are combined. In order to perform comparison, [34] used the results of accuracy obtained by algorithms to perform statistical analysis on algorithms. The statistical significance of combining a pair of algorithms is computed byUsing expression (22),    values of each algorithm are computed to find the final value. If the value is less than the critical value (e.g., α = 0.05), then it forms the basis for rejection of a hypothesis. However, a final decision cannot be made to fully reject or fail to reject (accept) a hypothesis based on an analysis result without performing a test on the possible error that could be accumulated when comparing algorithms.

The Friedman test can be conducted to compare two or more evolutionary algorithms and find errors that have been accumulated when two or more algorithms are compared [39, 40]. The Friedman test is a two-way analysis of the variations in the ranking of algorithms. The Friedman test is a nonparametric procedure that aims to compare the median of a distribution in order to find whether significant differences between the behaviors of two or more algorithms have occurred. The null hypothesis of Friedman test applies the equality of medians [41], while the alternative hypothesis negates the null hypothesis. The Friedman test procedure can be summarized into the following steps.

Step 1. Rank the algorithms for dataset separately.

Step 2. The best performing algorithm with the least MAE gets the rank of 1, the second best rank 2, and so forth.

Step 3. If there is a tie between ranks, assign the average rank. Let represent the rank of the of algorithm on the of dataset.

Step 4. Friedman test compares the average ranks of the algorithm as follows:The null hypothesis computes the equivalence and the ranks , which is equal to the Friedman statistic [39] computed aswhere is the rank and is distributed with degrees of freedom, such that and should have a large sample size () (as a rule of a thumb, and ) [41] since large sample sizes are significant in computing the degree of freedom on the rank of algorithms. is the number of groups that are being compared.

Step 5. The calculated value of must be larger than or equal to the appropriate critical table value of or larger than or equal to the value of in the small samples table.

Reference [41] indicated that, to perform multiple comparison, two measures are used; firstly, check whether the results obtained from the algorithm have inequality and rank using the Friedman test. The Friedman test states that, under a null hypothesis, all the algorithms are equivalent, so a rejection of a hypothesis indicates existence of significant differences in performance of all the algorithms studied [41].

In our approach to identify the best algorithm (deemed to be the algorithm with the lowest ranking value) that can be used as a control algorithm, the results in Table 18 were applied and the Friedman test was conducted to identify the best algorithm. In order to rank the algorithms, the mean and standard deviation were computed (in Table 20) on the results on accuracy of performance shown in Table 18.

The results in Table 20 indicate that KSA has the least standard deviation among the comparative algorithms. Since the standard deviation measures the amount of variation in a set of data [42], thus, the larger the standard deviation, the greater the variation in the data, whereas the smaller the standard deviation, the smaller the amount of variation in the data. Since KSA has a minimum standard deviation of 0.00001, thus there is a small variation in KSA. Based on the results in Table 20, the Friedman test ranked the algorithms in Table 21.

The rank in Table 21 indicates that KSA is the best algorithm among the comparative algorithms. Friedman’s test statistic with a sample size was computed in Table 22.

Table 22 shows the results on Friedman test, where obtained is 18.000, with 3 degrees of freedom and a significance (asymp. sig.) level of 0.0000. Since the significance level is (0.05), thus, the computed value on must be larger than or equal to the critical value for significance of 0.05. Since is 3 at 0.05 level of significance, the value that was read from the critical value of chi-square distribution table [43] is 7.82, thus 18 > 7.82 at (0.05). There is a significant difference in the results on the quality of estimation of missing values among the algorithms, meaning that the algorithms are not the same.

6. Conclusion and Future Work

This paper presented a new bioinspired algorithm, the Kestrel-Based Search Algorithm (KSA), and compared it with other metaheuristic algorithms such as the Firefly, WSAMP, and BAT algorithms. The results of the comparison showed that the KSA demonstrated potential uniqueness in its search for the best value in comparison with other metaheuristic methods such as wolf, BAT, and Firefly algorithms. The statistical test conducted on the test function calls time based on Wilcoxon signed-rank test indicated that total_time did not result in a significant change in the performance of major function calls. Similarly, total_time did not result in a significant change in the performance of in-built function calls. Since the Wilcoxon test helps to assign ranks to algorithms in order to identify the best ranked evolutionary algorithms, the major functions and in-built functions were ranked. The results indicate that the KSA was ranked second while WSAMP was ranked first in terms of in-built function call. However, all algorithms were ranked equally in terms of major function call. Further tests conducted on the results of accuracy in multiple comparison applied Friedman test to detect the Family-Wise Error Rate. The results on Friedman test ranked KSA algorithm as the best algorithm that is used as a control algorithm for multiple comparison of algorithms. In the future, we intend to apply this algorithm in solving real-world problems and for other big data purposes such as association rule mining and intend to apply it to further enhance its performance of estimation of missing values.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.