Mathematical Problems in Engineering

Volume 2018, Article ID 9457821, 16 pages

https://doi.org/10.1155/2018/9457821

## Bioinspired Computational Approach to Missing Value Estimation

^{1}ICT and Society Research Group, Department of Information Technology, Durban University of Technology, Durban, South Africa^{2}Department of Computer and Information Science, University of Macau, Taipa, Macau^{3}Department of Computer and Information Science, Bath Spa University, Bath, UK

Correspondence should be addressed to Richard C. Millham; moc.liamtoh@mahllimdrahcir

Received 26 June 2017; Revised 16 November 2017; Accepted 13 December 2017; Published 2 January 2018

Academic Editor: Erik Cuevas

Copyright © 2018 Israel Edem Agbehadji et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Missing data occurs when values of variables in a dataset are not stored. Estimating these missing values is a significant step during the data cleansing phase of a big data management approach. The reason of missing data may be due to nonresponse or omitted entries. If these missing data are not handled properly, this may create inaccurate results during data analysis. Although a traditional method such as maximum likelihood method extrapolates missing values, this paper proposes a bioinspired method based on the behavior of birds, specifically the Kestrel bird. This paper describes the behavior and characteristics of the Kestrel bird, a bioinspired approach, in modeling an algorithm to estimate missing values. The proposed algorithm (KSA) was compared with WSAMP, Firefly, and BAT algorithm. The results were evaluated using the mean of absolute error (MAE). A statistical test (Wilcoxon signed-rank test and Friedman test) was conducted to test the performance of the algorithms. The results of Wilcoxon test indicate that time does not have a significant effect on the performance, and the quality of estimation between the paired algorithms was significant; the results of Friedman test ranked KSA as the best evolutionary algorithm.

#### 1. Introduction

The concept of big data is defined using several characteristics including velocity, volume, and value. The characteristics of velocity are related to how fast incoming data need to be processed and how quickly the receiver of information needs the results from the processing system [1]; the characteristics of volume are related to the amount of data that has to be processed; and the characteristics of value are what the user of big data management will gain from the data analysis. Other characteristics of big data include variety and veracity. The characteristics of variety focus on the different structures that data may take, such as text and images, while the characteristics of veracity focus on authenticity of the data source that is being used for decision-making. These characteristics of big data have resulted in the use of innovative methods for decision-making. These innovative methods may require the combination of different technological platforms for storage, such as Hadoop, NoSQL, and relational databases, to successfully manage this big data. It is possible to have datasets on these platforms with missing values at random, which are a result of mismatched attributes [2] or omitted entries [3]. Hence, missing data is independent of the type of platform on which this data is placed. There are three categories of missing data: data missing completely at random (MCAR), data missing at random (MAR), and data missing not at random (MNAR) [4, 5], which has a different method to handle the missing data.

The missing data category of missing completely at random (MCAR) occurs when the missing values are randomly distributed throughout a matrix such that a missing value in a row of a matrix is not dependent on any other row entry in a dataset [4]. In other words, neither the row entry which is missing nor any other row entry can predict whether a value is missing. When this happens, the chances of the data being detected as missing are not dependent on either the missing or the complete value in the same row entry of a matrix. The listwise method to handle MCAR is ideally used to remove all data that has one or more missing cases; however, by this removal, a problem is created in that the missing values produce both biased parameters and incorrect estimates in analysis. While the pairwise method is also another method to handle MCAR, this method sought to address the missing value problem by computing the covariance estimates from all samples of cases observed on variables. The pairwise deletion method assumes that all data is completely missing at random; therefore, variables with missing data are then deleted during computation. This deletion could cause error in computation because each element in the covariance matrix may have a different group of attributes [6, 7].

Missing at random (MAR) category occurs when the missing value in a row of a matrix depends on another known row entry in a dataset. In other words, the missing value can be predicted from a previously known value in a dataset. Thus, the missing value is dependent on the previously known value. When this happens, it becomes easy to trace a pattern on a missing value in a row of a matrix. The traditional approach to handle MAR is pairwise deletion method as described previously.

Missing not at random (MNAR) (also known as nonignorable nonresponse) category occurs when the missing value in a row of a matrix depends on the other missing values in the row entry. When this happens, the known data cannot be used to estimate the missing value. Thus, the chances that the current value is detected as missing are dependent on the detection of previously missing values.

These traditional approaches to handle missing data such as listwise deletion or case deletion, pairwise deletion, and also sample mean substitution (i.e., -NN and -means clustering [4, 8]) are, however, not efficient at providing the best optimal estimates for missing values. For instance, the sample mean substitution method requires that each data point clustered around a centroid be computed so as to find the best estimates. Thus, the number of clusters, the number of data points, and the dimensions involved to compute missing values make it inefficient. With the pairwise deletion, since the method assumes all data is missing at random, it then uses the average sample size to estimate its standard error, which results in either underestimation or overestimation of the standard error in the analysis of missing values, and this makes it inefficient. In view of this, other efficient methods such as maximum likelihood [9] and multiple imputation method (for MAR), expectation-maximization (EM) algorithm [4, 10] and machine learning approach (such as autoencoder neural network) and metaheuristic algorithms, such as genetic algorithms, [11] have been proposed to handle missing values at random.

The maximum likelihood method is a method for estimating missing values by selecting a set of parameters or values that maximizes a likelihood function. The advantage of the maximum likelihood method is its consistency and unbiased estimation of the parameter closest to the observed value [12]. The expectation-maximization algorithm uses the maximum likelihood method to impute all missing values in a dataset [4]. This procedure in finding missing values uses probability to iteratively impute values in its estimation of an approximate parameter that becomes closer to the missing value [10]. This iterative process generates a weighted value that is improved in each iteration until a termination condition is reached. Additionally, when there are many variables and multiple missing values, then the computational time increases at each iteration. On the other hand, the autoencoder neural network or the autoassociative multilayer perceptron method consists of an input and output layer where the number of inputs is equal to the number of outputs [13]. This network, when used to estimate approximate missing values, uses activation functions to map sparse input space through hidden units to output space. In other words, this activation function is used as a function to control the input data from a dataset. During the estimation process, weighted parameters are used in the hidden unit of the network. This weight parameter is solved iteratively by maximizing the probability of the weight parameter in the hidden unit to produce the best value that is close to the missing value [5]. The advantage of the autoencoder neural network is that it gives reliable estimates as missing values; however, its efficiency depends on the number of hidden layers chosen, and the higher the number of hidden layers, the more the computational time required to estimate the missing value(s).

Genetic algorithms are an evolutionary approach which is based on the survival of the fittest. This survival depends on the mechanism of “natural selection” (Darwin, 1868, as cited in [14]) where element is represented using a binary string. A genetic algorithm is an adaptive search procedure [15], as cited by [14], which involves the use of operators such as crossover, mutation, and selection methods to find a global optimal result/solution. The search procedure starts with an initial guess and attempts to improve the guess through evolution [14] by comparing the fitness of the initial generation of population with the fitness obtained after application of operators to the current population until the final optimal value is produced. This adaptive search procedure is an iterative process that allows the elimination of weak individuals of a population through a continuous update of the initial generation of population via multiple generations until the termination condition is reached. The adaptive search procedure helps to find approximate missing values [16] by optimizing an objective function/fitness function in any given search problem.

Particle swarm is a bioinspired method that is based on the swarm behavior of fish schools and bird flocks in nature [17]. The swarm behavior is expressed in terms of how particles adapt and make decisions on change of position within a space based on the position of other neighboring particles. The advantage of swarm behavior is that as an individual particle makes a decision, it leads to an emergent behavior [18]. This emergent behavior is the result of local interaction among particles in a problem space. Among the particle swarm algorithms for finding the best possible solutions in a problem space are the Firefly algorithm [19], bats [20], and cuckoo birds [21]. The successful characteristic of fireflies is the short and rhythmic flashes they produce [19]. This flashing light is used as a mechanism to attract mating partners and attract potential prey, and it serves as a warning to other fireflies. The signaling system of this flashing light mechanism is controlled by simplified basic rules underlining the behavior of fireflies. Unlike a genetic algorithm which uses operators such as mutation, crossover, and selection, the firefly uses attractiveness and brightness to improve certain individuals in its population. The similarity between the genetic algorithm and the firefly is that both generate an initial population and continue to update the initial population using fitness functions. The brighter fireflies attract those closest around them and the fireflies whose flashes fall below a given threshold are removed from the population, while the brightest fireflies form the next generation, and the generations/iterations continue until a select criterion is reached or the maximum number of generations is reached. The behavior of fireflies where a bright firefly attracts a firefly with a weaker brightness has been applied in missing data imputation by finding estimates of values closest to known values and then replacing these missing values with these estimates.

Wolf Search Algorithm (WSA) is a bioinspired heuristic optimization algorithm which is based on wolf preying behavior [22]. The behavior of a wolf includes its ability to hunt independently by remembering its own trait (meaning wolves have memory); ability to only merge with its peer when the peer is in a better position (meaning there is trust among wolves to never prey on each other); ability to escape randomly upon appearance of a hunter [22]; and the use of scent marks as a way of demarcating its territory and communicating with other wolves [23]. This behavior expressed by wolves enables them to randomly adapt to their environment when hunting. If a wolf finds a new better position, the incentive is stronger to assume this new position provided that the position is already inhabited by a companion wolf. The wolf search algorithm is an iterative search process that starts with the setting of the initial parameter, random initialization of population, evaluation and updating a current population using a fitness test, and continuing on with creating new generations/iterations until some stopping criterion is met. Unlike the genetic algorithm that uses operators such as mutation, crossover, and selection methods or particle swarm algorithm, such as firefly, that uses attractiveness and brightness of prey, the wolf uses attractiveness of prey within its visual range. Furthermore, each wolf instinctively flocks together in a pack, which is collective, and organizes individual searches of an individual wolf. Therefore, the swarming behavior of WSA is delegated to each individual wolf and this could form multiple leaders swarming from multiple directions towards the best solution rather than a single flock searching for an optimum in one direction at a time [22]. This swarm behavior of wolves could be used to estimate the approximate value close to known values in a missing value at a random situation.

A variation of WSA is the Wolf Search Algorithm with Step Minus Previous (WSAMP). This WSAMP allows wolves to remember a previous best position and avoid the old positions which do not produce the best solution.

BAT algorithm [24] is a bioinspired method based on the behavior of microbats in their natural environment. The unique behavior that characterizes bats is their echolocation mechanism. This mechanism helps bats orient and find prey within their environment. The search strategy of bats is controlled by the pulse rate and loudness of their echolocation mechanism. The change in the pulse rate helps to improve on the previous position, while the loudness alerts each other bat on the best position that has been found [25]. The bat behavior has been applied in several optimization problems to find the best optimal solution. The BAT algorithm search process starts with random initialization of the population, evaluation of the new population using a fitness function, and finding the best population. Unlike the wolf algorithm that uses attractiveness of prey to govern its search, the BAT algorithm uses the pulse rate and loudness to control the search for the optimal solution.

Bioinspired search strategies are controlled by randomization, efficient local search, and global best solution [24]. The contribution of this paper is that the random encircling behavior of certain birds that is required in achieving an optimal solution for missing values is first proposed as a new computational method and then examined in comparison with other metaheuristic algorithms such as Wolf Search Algorithm with a step Minus Previous (WSAMP), Firefly algorithm, and BAT algorithm. The advantage of random encircling is that it maximizes the search space, thus creating a wider range from a hovering position for the best possible solution. We also evaluated the quality of the proposed computational method, the random encircling of birds such as Kestrel, using a fitness function.

The remainder of this article is organized as follows. In Section 2, we describe the behavior of Kestrel birds. Section 3 discusses the proposed computational model. The model consists of mathematical formulations on the Kestrel’s characteristics. Section 4 discusses the experimental results as well as comparisons of the proposed algorithm with the existing approach. Section 5 presents statistical analysis of experimental results. Section 6 contains conclusions and future work.

#### 2. Description of the Behavior of Kestrel Birds

The bioinspired algorithm is based on the behavior of Kestrel birds when hunting for prey. The Kestrel is a kind of bird that hunts by hovering (i.e., flight-hunt) or from a perch. These birds are strongly territorial and hunt individually [26, 27]. Reference [27] indicated that, during a hunt, Kestrels are imitative rather than cooperative. This suggests that Kestrels prefer not to communicate with each other but rather they imitate the behavior of other Kestrels with better hunting techniques and improve their hunting technique even though the hunting technique can change based on the type of prey, prevailing weather conditions, and energy requirements (for gliding or diving) [28].

During hunting, Kestrels use their eyesight to watch small and agile prey within their circling radius or coverage area referred to as the visual circling radius. The minute air disturbance from flying prey and the trail of urine and faeces from ground prey give an indication of the availability of prey. Once available prey is detected using these indications, the Kestrel positions itself to hunt. Kestrels are able to hover in changing airstream, maintain a fixed forward looking position with their eyes on the prey, and use random bobbing of the head to find the least distance between their position and the position of the prey. Also, the Kestrels possess excellent ultraviolet sensitive eyesight characteristics to visually locate trails because these trails of urine and faeces reflect ultraviolet light. Consequently, trails of prey such as voles become visible to Kestrels [29].

In hovering, Kestrels perform a wider search (global exploration) across territories within their visual circling radius, maintain a motionless position with a forward looking eye fixed on the prey, detect minute air disturbances from flying prey (particularly flying insects) to best position themselves to hunt the prey, and mostly move with precision through changing airstream.

Kestrels are able to flap their wings and adjust their long tails to stay in a place that is referred to as a still position in changing airstream. While in perch, mostly from high fixed structures, the Kestrel changes its perch every few minutes, performs a thorough search (a local exploitation using its individual hunt behavior) of its local territory with less energy requirements than a hovering hunt, and uses its ultraviolet sensitive capabilities to detect mammals such as voles closer to a perched area. This behavior suggests that, in perch, Kestrels conserve some of their energy and direct their ultraviolet sensitive capabilities to detect slowly moving prey on the ground. Moreover, an individual Kestrel with better perch and hovering skills in wider search area stands a better chance to move faster on its prey or disperse sooner from its enemy than individual Kestrels that develop hunting skills in local territories [27]. Therefore, it is significant to combine both types of hunting skills for a successful hunt. The characteristics of Kestrels are summarized as follows:(1)Soaring: this gives a larger search space (global exploration) within the visual coverage area.(a)They maintain a still (motionless) position with eyesight fixed on the prey.(b)They encircle the prey beneath with keen eyesight.(2)Perching: each Kestrel does a thorough search (local exploitation) within its visual coverage area.(a)They perform frequent bobbing of the head.(b)They get attracted to the prey using the detected visible trail and then glide to capture this prey.

The following assumptions are made on the characteristics:(i)The still position gives a near-perfect circle, and thus frequent changes in a circle direction depend on the position of the prey in shifting the center of its circling direction.(ii)Frequent bobbing of the head gives a degree of magnified or binocular vision that helps in measuring the distance to the prey, which then enables the Kestrel to move with a speed to strike.(iii)Attractiveness is proportional to light reflection; thus, the higher or the longer the distance from the Kestrel to the trail, the less the trail brightness. This distance rule applies to both hovering height and distance away from perch.(iv)New trails are more attractive than an old trail. Thus, trail decay or trail evaporation depends on the half-life of the trail.

#### 3. The Proposed Computational Model

The proposed computational model for Kestrel’s missing value estimation is based on the description of Kestrels’ behavior and characteristics. The following mathematical expressions depict the characteristics of the Kestrel.

*(i) Encircling*. Encircling is when the Kestrel randomly shifts (or changes) the center of circling direction to recognize the current position of the prey. As the prey changes its current position, Kestrels randomly use the encircling behavior to encircle the prey. This movement of the prey determines the best possible position assumed by the Kestrel. The encircling [30] is expressed asThus, where is the coefficient vector, is the encircling value obtained to indicate best position, is the position vector of the prey, is coefficient vector, and indicates the position vector of a Kestrel, and and are random numbers generated between 0 and 1.

*(ii) Current Position*. The current best position of the Kestrel is expressed as Thus,where is the coefficient vector, is the encircling value obtained, is the position vector of the prey, and represents the current best position of Kestrels. linearly decreases from 2 (upper bound value) to 0 (lower bound value) and it is used to control the randomness in iteration. is expressed as follows:where itr is the current iteration and Max_itr represents the maximum number of iterations to terminate the search. represents the higher bound value while represents the lower bound value. Other Kestrels that are involved in the search update their position according to the best position of the leading Kestrel. Also, the change in position of a Kestrel in airstream depends on the frequency of bobbing, attractiveness, and trail evaporation. This is expressed as follows.

*(a) Frequency of Bobbing*. The frequency of bobbing is used for sight distance measurement in the search space. This is expressed as where is a random number generated from lower and upper end points to control the frequency of bobbing within a visual range. represents the maximum frequency and is the minimum frequency both between 1 and 0, respectively.

*(b) Attractiveness*. Attractiveness indicates the light reflected from a trail, which is defined bywhere represents the attractiveness, represents variation of light intensity in the range , and represents the sight distance measurement which is expressed using Minkowski distance formulation as follows:Thus,where is the current sight measurement, are all potential neighboring sight measurements near , is the total number of neighboring sights, is the order (1 or 2), and is the visual range.

*(c) Trail Evaporation*. A definition of a trail is the formation and maintenance of a line [31]. In metaheuristic algorithms, ants use trails both to trace the path to a food source and to prevent themselves from getting stuck in a single food source.Thus, ants, using these trails, can search many food sources in a search space [14]. As ants continue to search, trails are drawn and substances are deposited in the trail. These substances help ants to communicate with each other about the location of food sources. Therefore, other ants continuously follow this path and also deposit substances for the trail to remain fresh. Similar to ants, Kestrels use trails in search of food sources. However, these trails are rather deposited by prey, which provides an indication to Kestrels on the availability of food sources. The assumption is that the substances deposited by these types of prey are similar to substances deposited on ants’ trails. Additionally, when the source of food depletes, Kestrels no longer follow this path. Consequently, the trail substance begins to diminish with time at an exponential rate causing trails to become old. This diminishment denotes the unstable nature of the trail substances which can be theoretically stated as follows: if there are unstable elements with an exponential decay rate *γ*, then an equation can be formulated to describe how substance decreases in time [32]. This equation is expressed as follows:In other words, since the substances are unstable, this introduces randomness in the decay process. Thus, the decay rate () with time () is reexpressed aswhere is a random initial value of substance that is decreased at each iteration and where is the number of iterations or time steps. , where is the maximum number of iterations. The decay rate at time to indicate a new trail or old trail is expressed asAgain, the decay constant *λ* is expressed bywhere is the decay constant, is the maximum number of substances in the trail, is the minimum number of substances in the trail, and is the half-life period of a trail which shows that the trail is old and unattractive.

Finally, the Kestrel updates its position using the following equation:where is the current best position of the Kestrel which represents the candidate solution and is the previous position of the Kestrel. represents the attractiveness as expressed in (7). represents a Kestrel with a better position while is the frequency of bobbing as expressed in (6).

*(iii) Fitness Function*. The fitness function is used to evaluate how well the algorithm performs in terms of the quality of estimation. This performance is measured in terms of minimizing the deviation of data points from the estimated value without considering the direction (negative or positive) of the fitness value. Thus, the performance measurement method used the mean of absolute error (MAE) as fitness function evaluation because it allows the model to fine-tune absolute values and improve on the performance of values into much finer positive values without consideration of negative values. The MAE is expressed in where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the dataset, and is the number of data points in the sampled dataset. There are other evaluation methods such as root mean square error (RMSE) and mean square error (MSE).

The root mean square error (RMSE) measures the mean square error in the original data point and estimated value. The RMSE is expressed as the square root of the variance (i.e., standard deviation) in where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the sampled dataset, and is the number of data points in the sampled dataset.

The mean square error (MSE) measures the square of the deviation between the estimated values and the actual data point for the variable being considered; the smaller the MSE value, the better the accuracy of estimation, and vice versa. The MSE is expressed in where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the sampled dataset, and is the number of data points in the sampled dataset.

The difference between the RMSE and the MSE is that MSE minimizes the error between the observed data and estimated value, but RMSE further minimizes the variance, while the mean of absolute error (MAE) measures the magnitude of errors without considering the direction of the fitness value.

In our comparison, we expressed the fitness function using the mean of absolute error as follows: where is the observed data point at the position in the sampled dataset, is the estimated value at the position in the dataset, and is the number of data points.

*(iv) Velocity*. The velocity of a Kestrel moving from its current best position in a changing airstream iswhere is the current best velocity, represents the initial velocity, and represents the best position of the Kestrel. The change in velocity is controlled by the inertia weight (which is also referred to as the convergent parameter). This inertia weight has a linearly decreasing value. The final velocity is thus expressed to include this inertia weight as expressed in where is the convergence parameter, is the initial velocity, is the best position of the Kestrel, and is the current best velocity of the Kestrel. A Kestrel search through the search space in order to find an optimal solution requires the continuous update of the velocity, random encircling, and position towards the best estimate.

The proposed algorithm to implement KSA is expressed in Algorithm 1.