Abstract

This paper presents a new method for the tracking of maneuvering flying vehicles using a deformable contour model in color video sequences. The proposed approach concentrates on targets with maneuvering motion in sky, which involves fundamental aspect change stemmed from 3D rotation of the target or video camera. In order to segment and track the aircraft in a video, at first, the target contour is initialized manually in a key frame, and then it is matched and tracked automatically in the subsequent frames. Generally active contour models employ a set of energy functions based on edge, texture, color, and shape features. Afterwards, objective function is minimized iteratively to track the target contour. In the proposed algorithm, we employ game of life cellular automaton to manage snake pixels’ (snaxels’) deformation in each epoch of minimization procedure. Furthermore, to cope with the large aspect change of aircraft, a Gaussian model has been taken into account to represent the target color in RGB space. To compensate for changes in luminance and chrominance ingredients of the target, the prior distribution function is dynamically updated during tracking. The proposed algorithm is evaluated using the collected dataset, and the expected probability of tracking error is calculated. Experimental results show positive results for the proposed algorithm.

1. Introduction

The video-based locating and tracking of flying vehicles is an interesting issue in the visual control of aerial systems, which may be employed in aerial surveillance, the navigation of flying robot, missile, microflying, unmanned aircraft, and so forth. In order to localize, track, and recognize flying vehicles, some approaches have been presented recently. In this context, four main state-of-the-art methodologies are well known and applicable including (1) invisible spectrum-based methods like radio detection and ranging (RADAR) or light detection and ranging (LIDAR); (2) visible spectrum-based approaches [18] such as existing algorithms in infrared and thermal imaging systems in the wavelength range of 380 nm to 780 nm and even more in far infrared case; (3) global positioning system (GPS-) based methods; and (4) combination of visible and invisible spectrum-based methods. Feasibility of these categories is mostly dependent upon the distance of the imaging system to the target of interest. Furthermore, each flying vehicle has a set of flight specifications such as the minimum and maximum speed, maneuvering capability, flight board, and so on, whose data may help to estimate the position of the target accurately.

Irrespective of available information, the paper emphasizes on flying vehicle tracking (FVT) in color video sequences. In this domain, we concentrate especially on maneuvering aircraft with large aspect change, which is considered to be a challenging issue. Considering the high deformation of target contour in this application, utilizing deformable surfaces such as active contour [911] and active mesh [1, 12] models seems to be a proper choice.

1.1. Related Work

In [1], a method for flying vehicle tracking was introduced in monochrome video sequences. In this method, the incipient location of the target was determined manually; then, the target was tracked automatically by optimizing the mesh energy functions. In [2], a vision-based scheme for automatic locating of a flying vehicle was presented by means of extracting fuzzified edges features and matching edge pyramids as well as a motion flow vector classifier based on multilayer perceptrons (MLP) neural network. Ha et al. [3] introduced a method for real time tracking of flying targets via the combination of the geometric active contour model and optical flow algorithm. Haker et al. [4] used a segmentation method based on the adaptive thresholding and Bayesian statistic approach to identify and track target location. In [10], an active contour model for vehicle target tracking was utilized. To deal with the high aspect change problem, they handled the motion model around the vehicle to reduce the tracking error in monochrome video frames. Yilmaz [13] suggested an object tracking method based on an asymmetric kernel and mean shift approach in which the scale and the orientation of the kernel were changed adaptively. Molloy and Whelan [14] introduced an active mesh system for the motion tracking of rigid objects to alleviate one of the main problems in active contour models, that is, the snake initialization. Jian and Jie [15] proposed an approach to track small objects in infrared streams using the mean shift estimator algorithm. In [16], a method based on edge features extraction and matching as well as Kalman filter was suggested to detect and track mostly rigid objects like the land vehicles in video sequences grabbed by moving video cameras. Their method determined camcorder motion model by using the planar affine transformation. Lee et al. [17] suggested a deformable contour model based on frames difference map. In [18, 19], an approach for flying vehicle tracking based on the snake model was proposed. They utilized Kalman filter and an energy function the so-called electric fields, to handle large displacement of the target during tracking.

1.2. Motivation

Basically, target rotation around three roll, yaw, and pitch axes results in maneuvering motion of flying vehicle in sky. Additionally, the 3D rotation of aircraft or camera causes aspect change of the target during tracking process. Figure 1 shows aspect change problem in a typical video sequence. In the targets with aspect change some parts of the target appear during the video frames and some other parts disappear. The aspect change may create changes in the luminance and chrominance components of the target. Therefore, traditional optical flow approaches [20] or method based on feature matching [21] and model based approaches [16] fail in tracking targets with large aspect changes. Furthermore, because of maneuvering characteristics of the target, the path of target cannot be determined using algorithms based on estimation or data association [22]. In the paper, we focus on tracking aircraft targets with maneuvering motion and full aspect change and design a new scheme to alleviate general problems in this area.

Among different methods in literature for visual maneuvering aircraft locating and tracking, the approaches such as [10, 16] take into account partial aspect change. However, their efficiency was not proven for target with full aspect change where the target shape and view completely change. In this context, other issues such as different atmospheric conditions, change in luminance and chrominance, image noise, dynamic scene due to the motion of the camera, dwindling the size of target, and visibility reduction of the scene are challenging. Some of these problems like the change in luminance and chrominance and the size of the target are substantially originated due to the aspect change phenomenon.

To test the algorithm with video files in the mentioned situations and provide a standard and informative dataset, the collected dataset includes videos with various conditions in addition to the aspect change phenomenon.

To cope with these problems, we propose an algorithm based on the active contour model for tracking maneuvering target with full aspect change. The algorithm includes the following structural features.(i) A deformable active contour model is designed to track maneuvering aircraft target with full aspect change.(ii)A new set of external energy functions are defined in RGB color space to enhance tracking efficiency.(iii)Game of life (GoL) cellular automaton is proposed to manage, arrange, and smooth snake pixels (snaxels) deformation in each epoch of minimization.(iv)A parametric, multivariate, and unimodal Gaussian model is utilized to dynamically update color distribution of the target of interest in color video frames.

The rest of the paper is organized as follows: Section 2 summarizes the suggested method, and in its subsections, we discuss the deformable contour model, energy minimization and GoL-based contour updating, respectively. Experimental results appear in Section 3, and we conclude the paper in Section 4.

2. Proposed Method

Figure 2 depicts the block scheme of the proposed aircraft tracking algorithm. In the proposed algorithm, firstly the outer contour of the target is localized manually in the initial key frame. Then, a parametric, multivariate, and unimodal Gaussian model is calculated based on the central limit theorem (CLT) to represent target color distribution in RGB space. After contour initialization, the tracking algorithm starts by employing the parameters of the single gaussian model (SGM), that is, mean vector and covariance matrix, to find the location of the target contour in the current frame. Thereafter, the objective energy function is defined and optimized by means of a constrained greedy minimization procedure. To manage snaxels’ deformation, we introduce a GoL cellular automaton which is utilized in each epoch of the optimization routine. In order to deal with the aspect change phenomenon appropriately, the dynamic parametric model is updated after fixing control points and/or reaching the maximum number of iterations in each frame.

2.1. Determining Target Boundaries Using Color Matching Algorithm

In the proposed method, a 2D active contour model is utilized to represent the outer boundary of the target during tracking algorithm. The structure of deformable contour is organized by superposition of predefined energy functions. These functions are minimized using an optimization algorithm.

One of the difficulties in tracking targets with high speed using active contour is the problem of local minimums in the energy minimization algorithm. Therefore, we utilize color distribution of the target to find the initial position of the target. For the first frame of the video, we localize the contour of the target manually. This leads to identify the region of interest (ROI) of the target. Figure 3 shows the ROI of a target and the histograms of color channels.

To determine the initial position of the contour in subsequent frames, a search region (SR) is defined to track the flying vehicle by matching and deforming the contour model. We consider the coordinates of SR as and , where The vector is a confident margin for the target displacement. This margin may be estimated based on the interframe motion of the target. It can be assumed constant, if the speed of aircraft is approximately uniform. In our simulations, we consider the confident margin as .

To determine the initial location of the contour in frame , a parametric, unimodal Gaussian model is considered to represent color distribution of the target in RGB space. The color distribution of the target is represented by a multivariate SGM as where is the color vector, is the frame index, defines pixels coordinates, and the function represents a multivariate normal distribution as follows: Here is the dimension of the color space, which is assumed to be 3 for RGB color space. The mean vector and covariance matrix are determined by the color information of pixels located inside the contour of the target in the frame as The initial location of the target contour in the current frame is determined using a color matching algorithm. The algorithm includes the following stages.(1)  Determine the color matching threshold, , by means of prior color distribution as where the parameter is a constant number in the range of [0, 1], and the vector includes the diagonal elements of matrix . The notation denotes the ceiling function.(2)  Determine pixel color consistency for all pixels in SR in the current frame by measuring the Mahalanobis distance: (3)  Constitute binary color consistency image , by applying a threshold on : where “1” values show image pixels, which are consistent with target color distribution. (4)  Suppress small regions in by applying a size-filtering algorithm. In the size filtering, distinct patches that have an area less than the threshold, , are omitted: in which the factor is equal to , and the parameters and denote the height and width of SR, respectively. (5)  Extract the outer boundary of consistent pixel in as the initial contour position in the current frame. The final contour of the target is determined using 2D active contour model.

2.2. Active Contour Energy

The active contour model in the search region is considered as a closed contour with -ordered snaxels, , in which denotes the control point or snaxel. The contour energy in the active contour model is defined as the sum of energy functions for different snaxels of the contour:

The snaxel energy consists of two parts including internal and external energies: The internal energy function defines shape characteristics of the contour and is defined using the following [23]: where and are the normalized continuity and curvature energies for the snaxel , respectively. The continuity and curvature energy functions are defined as Here, the domain describes Moore 9-neighborhood as shown in Figure 4, and the parameter is the average distance between two neighboring snaxels. The parameter is updated at the beginning of each epoch of energy minimization routine.

We define the external energy function based on image features such as edge, color, and texture to cope with the problem of large aspect change. Therefore, the external energy function for snaxel is defined as The first term of external energy represents the edge energy. The edge energy attracts the target contour toward pixels with strong edges or pixels with large image gradients. In order to neglect noisy or weak edges, we define edge energy as where defines the color gradient at the snaxel and the gradient threshold is employed to remove weak edges. To calculate color gradient, different color channels, ,  for all , are smoothed using 2D Gaussian kernel, and image derivatives in and directions are calculated using Sobel operators. Then, we calculate the parameters of the color gradient as follows [24]: where and ,  for all  define the derivatives of color components and , for all  represents gradients directions.

We define normalized gradient energy, , as Figure 5 illustrates the results of the calculated edge energy image for different values.

The second term of the external energy is the similarity energy, . We define the similarity energy to attract a snaxel toward an image location with the same color distribution of the snaxel in the previous frame. To measure the similarity, the Mahalanobis distance in RGB color space is employed. We utilize the snaxel and its Moore 9-neighborhood to calculate the color distribution of the snaxel. For the control point , the normalized similarity energy is obtained as where is defined as The third term of external energy, which is called texture energy, employs image texture to define energy function. This energy is based on the entropy of color channels. The entropy is a method to measure the content of information. Pixels in the target boundary have higher information content or entropy in comparison with pixels located inside the target or sky background in SR. Therefore, by utilizing the texture energy we aim at attracting contour snaxels toward points with higher information content.

To measure texture energy, we first calculate the entropy for each individual color channel using the following equation: where is a random variable representing intensity level in color channel , and denotes its probability mass function for color channel which is determined using image histogram in Moore 9-neighborhood.

The normalized texture energy based on the entropy of color channels, for the snaxel , is defined as where color entropy image is obtained as Here , , and denote the entropy of , , and color channels, respectively. The minus sign in (21) is used to minimize the texture energy in the areas with high information content.

Figure 6 illustrates two typical aircraft images and their color entropy images. As it is shown in the figure, the target area and its boundary reveal higher entropy values.

2.3. Energy Minimization

After defining energy function for the active contour, the objective energy function defined in (9) is minimized iteratively by means of a constrained greedy optimization routine in order to fit the deformable contour on the target precisely. Here, the constrained term denotes a set of conditions, which are applied during the minimization algorithm. The pseudocode of energy minimization procedure is given in Pseudocode 1 in which we apply the proposed GoL cellular automaton after each epoch of energy minimization algorithm.

(1) For Iter 1 to Iter_max 5
(2) For to
   (i) Select th snaxel of the deformable contour.
   (ii) Calculate energy function for the th snaxel and its Moore 9-neighborhood.
   (iii) Find location with minimum energy if there is only one local minimum in Moore 9-neighborhood. Otherwise,
    find the point with shortest path. The path is measured based on Euclidian distance.
End for .
(3) Move snaxels to new locations.
(4) Apply the proposed GoL cellular automaton as it will be explained in the next subsection.
(5) Update characteristics of deformable contour including coordinates of new control points as well as the number of existing
  snaxels, .
End for Iter.

2.4. Game-of-Life-Based Snaxel Updating

Both target and camera motion like 3D rotation and translation and zooming by the video camera result in changing shape and outer boundary of the aircraft. It is obvious that dynamic motion of snaxels toward the outer boundary of aircraft due to energy minimization routine causes some irregular deformation in the contour. Consequently, the number of control points should be changed. To regularize and smooth snaxels of the contour after energy minimization, we propose a GoL cellular automaton.

GoL cellular automata are substantially a type of two dimensional cellular automata [25]. Fundamentally, automata are simple agents with three main features including uniformity, synchronicity, and locality. Different cellular automata follow special rules to gain a distinct goal. Here, we design a set of rules for GoL to handle changes in snaxels of active contour model. According to the defined rules in the proposed automaton, some of snaxels, that is, alive cells, may be dead or remain without any change during evolution. It is also possible to create some new snaxels in the next generation. In the proposed cellular automaton, the neighborhood radius, , is set to 1 with the 8-cell Moore neighborhood, and the number of generations, , is considered to be 7. We consider two states, , with the set of states , in which the states and represent dead and alive cells, respectively. Figure 7 shows the state diagram of the proposed cellular automaton. In the proposed GoL cellular automaton, the next state of a snaxel is determined based on the following rules.(i) The rule of continuance of life: a live central cell will survive if its Moore 8-neighborhood is matched with one of the states in Figure 8(a).(ii) The rule of death and birth: a live central cell dies, and its adjacent dead cell that is located between two alive cells becomes alive at the next generation when a correspondence to one of the states in Figure 8(b) is found.(iii) The rule of death: a live central cell becomes dead if its Moore 8-neighborhood is matched with one of the states in Figure 8(c). (iv) The rule of birth: a boundary dead cell with a live central cell that itself has one alive cell in its Moore 8-neighborhood will be checked for alive state. In this case, we check Moore 8-neighborhood of the dead cell; in the case of a match with one of the structures in Figure 9, the central dead cell is marked as alive in the next generation.

It is important to note that the total number of possible structure, , for , , and the Moore 9-neighborhood is determined as In 2D active contour domain, it is not necessary to consider all the structure. Therefore, we define rules for only necessary structures.

Figure 10 shows simulation results for five initial states up to 6 generations. The results demonstrate that after maximum 4 generations, snaxels are stable. The cellular automaton in the first row of Figure 10 represents an ideal case for a contour whose shape is preserved during evolution; whereas other cellular automata have some irregularity and discontinuity because of the energy minimization algorithm. These contours are smoothed and regularized using the proposed GoL cellular automaton.

3. Experimental Results

The proposed FVT algorithm was implemented using a MATLAB program and tested using a Pentium-IV desktop Personal Computer (PC) with 2.8 GHz CPU and 512 MB RAM. For maneuvering aircraft tracking purposes, we provided an informative dataset called Maneuvering Aircraft Video DataBase (MAVDB). The database includes 72 video clips captured by diverse monocular, moving CCD camcorders. The dataset is freely available for academic application. Time duration of different videos in MAVDB varies from 1.2 s to 51.64 s with the frame rate of 15 Hz or 30 Hz. MAVDB include maneuvering targets in different conditions such as large aspect change, 3D rotation around different axis, targets with change in size, rigid or piecewise rigid flying vehicles, smoky aircrafts, cloudy atmospheric conditions, and illumination change.

Due to the presence of a relative similarity between noise and rain drops and/or ice crystals, we added Gaussian as well as salt and pepper noises to some video streams to simulate different atmospheric conditions especially rainy and snowy situations.

In Figure 11, the results of the proposed FVT algorithm on eleven sample video clips of MAVDB are shown.

Figure 12 depicts results of the proposed method on three noisy video clips. In Figures 12(a) and 12(b), we test our FVT under additive white Gaussian noise (AWGN) with normal distributions of and , respectively. Figure 12(c) shows the results of the proposed tracking algorithm for video sequence with 12.5% additive salt and pepper noise (ASPN). The video sequence of Figure 12(a) shows a smoky fighter jet with maneuvering motion, and Figure 12(b) is a synthetic airplane video with large aspect change. In Figure 13, we have also plotted the trajectory for the true centroid (center of gravity (CoG)) and the tracked CoG of the target in Figure 12(a). These results demonstrate the robustness of our method in noisy video sequences.

Figure 14 depicts the results of the proposed FVT approach on four videos of MAVDB that seem to be more challenging than other videos. In the video sequences of Figure 14(a), a relatively abrupt change in statistical characteristics of color distribution exists. Figure 14(b) shows video sequence of a helicopter with two dominant red and black colors. In video sequence of Figure 14(c), the direct sunlight changes and saturates the intensity values for some parts of the target. The background of the video is also changed to be cloudy in some frames. Figure 14(d) shows the video sequence of an aircraft with textured surface. Obviously, in such situations, the probability of the tracking error increases and target contour fluctuates during tracking process. However instabilities are controlled and compensated because of the dynamic structure of the proposed tracking method as it is shown in Figure 14(d).

In order to measure the performance of the proposed FVT algorithm, we have measured the expected probability of the tracking error using two different methods based on pixel-based performance evaluation (PE) algorithm. For this purpose, two criteria are considered and measured including alignment amount (AA) and confusion matrix (CM).

In order to measure the performance based on the AA criterion, we have defined the probability of error for a video frame as where and are ground truth (GT) and predicted result (PR) frames, respectively. Therefore, in a video clip containing frames, the mean error is calculated as follows: where the operator denotes the mathematical expectation.

The second performance evaluation method is based on a confusion matrix, . To evaluate performance in a frame in this method, we construct the confusion matrix as follows:

where the notations and stand for object and non-object, respectively, and the elements TP, FP, FN, and TN stand for true positive, false positive, false negative and true negative, respectively. In performance evaluation based on the confusion matrix, the expected probability of error for a video clip including frames is determined as where the total confusion matrix, , is calculated by using the following equation: In the above equation and are the height and width of the video frames.

Figure 15 exemplifies the method to determine GT and PR in a typical video frame in order to measure the performance for both AA and CM methods. Considering high volume of frames in database, we calculate GT and PR with the interval of 10 frames in our experiments. Figure 16 illustrates the probability of error in terms of frame number for video streams shown in Figure 14(d) for . As it is shown in this figure, both curves show that the probability of error is not accumulative and is controlled during the tracking process. Figure 17 shows the expected probability of error in terms of for video sequences shown in Figure 14(d).

To compare the results of the proposed algorithm with those of other methods, we also implemented the active contour method by Williams and Shah [11], parametric active contour method by Lee et al. [17], and the method presented by Torkan and Behrad [18, 19]. We implemented and tested different methods using the collected MAVDB dataset. Table 1 shows the expected probability of tracking error for different implemented algorithms using AA and CM methods. Table 1 shows that the expected probability of errors for the proposed algorithm using all videos in database are 13.37% and 0.87%, based on AA and CM methods, respectively. The table shows that the proposed algorithm has enhanced the accuracy up to 17.77% and 1.85% based on AA and CM methods, respectively.

4. Conclusions

In this paper, we presented a new algorithm for visual maneuvering aircraft tracking in sky based on the deformable contour model in color video sequences. The main aim of the algorithm is to cope with the maneuvering motion and large aspect change of targets stemmed from 3-D rotation of targets in sky or camera. Furthermore, we examined other challenging issues like noisy streams, different atmospheric conditions, and smoky aircraft. To test the proposed algorithm, we collected different video clips in a dataset called MAVDB. We implemented some existing method and tested the algorithm with the collected dataset. To evaluate performance of the proposed method, we measured the probability of tracking error based on two AA and CM criteria. The experimental results demonstrated that the proposed algorithm has reduced the probability of tracking error considerably. In error analysis for the proposed method, we perceived that, during tracking, most of the errors originated from changing the texture of target and its background. Fortunately, the dynamic update strategy in the proposed tracking method controls errors and prevent error accumulation as shown in Figure 14. For future works, we are going to focus on some other challenging issues such as occlusion, complex, and cluttered background, which may occur during take-off and landing.

Acknowledgment

The authors would like to sincerely thank A. Taimori, Ph.D. candidate of System Communication in the Tehran Science and Research Branch, Islamic Azad University, Iran, for his valuable suggestions that have helped the to improve the quality of this paper.