Abstract

Motion analysis based moving object detection from UAV aerial image is still an unsolved issue due to inconsideration of proper motion estimation. Existing moving object detection approaches from UAV aerial images did not deal with motion based pixel intensity measurement to detect moving object robustly. Besides current research on moving object detection from UAV aerial images mostly depends on either frame difference or segmentation approach separately. There are two main purposes for this research: firstly to develop a new motion model called DMM (dynamic motion model) and secondly to apply the proposed segmentation approach SUED (segmentation using edge based dilation) using frame difference embedded together with DMM model. The proposed DMM model provides effective search windows based on the highest pixel intensity to segment only specific area for moving object rather than searching the whole area of the frame using SUED. At each stage of the proposed scheme, experimental fusion of the DMM and SUED produces extracted moving objects faithfully. Experimental result reveals that the proposed DMM and SUED have successfully demonstrated the validity of the proposed methodology.

1. Introduction

Moving object extraction observed in a video sequence and estimation of corresponding motion trajectories for each frame are one of the typical problems of interest in computer vision. However, in real environments moving object extraction becomes challenging due to unconstraint factors that is, rural or clutter environment, brightness or illumination, static or dynamic object types which together motion degeneracy may result in worthless for moving object extraction [135]. Besides, due to rapid platform motion, image instability, the relatively small size of the object of interest signatures within the resulting imagery, depending on the flight altitude and camera orientation, appearance of the objects within the observed environment changes dramatically making the moving object detection a challenging task [2731, 3660].

Motion carries a lot of information about moving object pixels which plays important role for moving object detection as image descriptor to provide a rich description of object in different environment [11, 6165]. From physics perspective, it found its way into probability theory, and since computer vision leverages probability theory, this research finds that it is only natural to use moments to develop a dynamic motion modeling which can be applied or becomes helpful to limit segmentation tasks in computer vision to reduce computational complexity.

This paper presents moments based motion parameter estimation called dynamic motion model (DMM) to limit the scope of segmentation called SUED which influences overall detection performance. Based on analysis from previous moving object detection frameworks, this paper used DMM model which is embedded under frame difference based segmentation approach and can handle robustness for optimum detection performance.

2. Background Study

For accurate detection, motion must be accurately detected using suitable methods which are affected by a number of practical problems such as motion change over time and unfixed direction of moving object. Motion pattern analysis before detecting each moving object has started to get attention in recent years, especially for crowded scenarios when detecting each individual is very difficult [46]. Through the modelling of object motion, the detection task becomes easy and thus also can handle noise.

As the scene may contain different motion patterns at one location within a period of time, that is, road intersection, averaging or filtering before knowing the local structure of motion patterns may destroy such structure. This paper proposes to use effective motion analysis for moving object extraction. In Figure 1, four existing approaches for motion analysis are shown.

Global illumination compensation approach works on brightness or illumination changes. Due to dependency on brightness, in real world this research does not progress so far. In parallax filtering approach, a scene that contains strong parallax is still difficult for existing methods to achieve good segmentation results [46]. In contextual information approach, contextual information has been applied to improve detection of moving object from UAV aerial images [46, 48]. This method assumes low level motion detection. The errors in low level motion segmentation under strong parallax situation are not considered in this method. For long-term motion pattern analysis [46] approach, there is scope to use context information to improve both low level motion segmentation and high level reacquisition even when there is only one single object in the scene [46]. However, this approach still needs further research to detect moving object robustly.

Table 1 shows the comparison table for compatibility of different approach for motion based moving object detection research from UAV aerial images. In Table 1, low level of motion indicates motion without parallax; high level motion indicates motion with parallax. Among these 4 types, the first three are not distinctive property for moving object because of environmental condition for the first types; a lot of parameter calculations for the second, third, and fourth types increase computation complexity.

Indeed, detection of motion and detection of object are coupled. If proper motion detection is done, detection of moving object from UAV aerial image becomes easier. Very few researches concentrate on adaptive robust handling of noise and unfixed motion change as well as unfixed moving object direction. For that reason an adaptive and dynamic motion analysis framework is needed for better detection of moving object from UAV aerial images where overall motion analysis reduces dependency on parameter. In other words, detection of motion indicates detection of motion pixels from frames which can be described as some function of the image pixel intensity. Pixel intensity is nothing but the pixel color value. Moments are described with respect to their power as in raised-to-the-power in mathematics. Very few previous researches used image moments to present motion analysis. Thus, this paper proposes to use image moments before segmenting individual objects and to use motion pattern in turn to facilitate the detection in each frame.

Moving object detection from UAV aerial images involves dealing with proper motion analysis. Previously very few researchers used methods which involve effective motion analysis. In [1, 16], the authors proposed Bayesian network method which depends on the fixed shape object constraint and also did not overcome the problem of aspect ratio constraint. Image registration method does not suit because as the number of motion block increases the detection rate decreases. Clustering based approach does not suit well for the complexity of shortening environment and inconspicuous features of object [10, 17]. Scalar invariant feature transform (SIFT) used only key point of object and does not suit well for noisy environment [6]. Cascade classifier based approach uses gray scale input images which are unrealistic in the real-time object detection [9]. Symmetric property based approach gave good result only on structural object shape [5, 11]. Background subtraction approach does not overcome blob-related problems mentioned above for registration method [8, 13, 38]. Shadow based approach depends on lightening condition [2, 7, 59]. Region based appearance matching approach does not give the optimum result for crowded scenarios [44, 54]. Histogram oriented gradients (HOG) approach rejects objects backgrounds [36].

Among these methods only frame difference approach is involved with motion analysis although most of previous research does not provide proper motion estimation to handle six uncertainty constraint factors (UCF) [29]. Frame difference causes the moving object to become into pieces due to object’s color homogeneity and also grabs the motion information. Frame difference based approaches were performed by registering two consecutive frames first, followed by frame difference to find moving objects. These approaches are faster but usually cause a moving object to become into pieces especially when the object’s color distribution is homogenous. Frame difference detects the pixels with motion but cannot obtain a complete object [14, 37]. Using segmentation based approach can overcome the shortcomings of using frame difference based approach. Image segmentation is used to determine the candidates of moving objects in each video frame [42]. Image segmentation extracts a more complete shape of the objects and reduces computation cost for moving object detection from UAV aerial images [39, 47]. However, it does not have the ability to distinguish moving regions from the static background. Using frame difference and segmentation together can achieve optimum detection performance but research in [42, 44] did not achieve reliable performance because of approximating pure plane and complexity needs to be low. Low detection rate and high computation time are the current research problems to apply frame difference and segmentation together [1, 15, 36, 40] as shown in Figure 2. Applying frame difference or segmentation separately does not include motion analysis currently which increases detection rate with low computation complexity.

This paper states that as frame difference cannot obtain motion for the complete object alone and segmentation does not have the ability to differentiate moving regions from the basic static region background, so applying frame difference and segmentation together is expected to give optimum detection result with high detection speed for moving object detection from UAV aerial images instead of applying frame difference or segmentation separately. For that reason this paper proposes moments based motion analysis to apply under frame difference based segmentation approach (SUED) which ensures robustness of the proposed methodology.

3. Research Methodology

Proposed moments based motion modeling is depicted in Section 3.1 and frame difference based segmentation approach, segmentation using edge based dilation, is depicted in Section 3.2. Each section of methodology is proposed with new approach to ensure robustness and accuracy of detection approach.

3.1. Proposed Dynamic Motion Model (DMM)

In computer vision information theory, moments are the uncertainty measurement of the object pixels. Besides, an image moment is a certain particular weighted average (moment) of the image pixel’s intensities, or a function of such moments, usually chosen to have some attractive property or interpretation. Image moments are useful to describe objects after segmentation. Properties of the image which are found via image moments are centroid, area of intensity, and object orientation as shown in Figure 3.

Each of these properties needs to be invariant by the following terms: translational invariant, scale invariant, and finally rotation invariant. Any feature point is called translational invariant if it does not distinguish under different points in space. Or translational invariant means that a particular translation does not change the object. Scale invariance is a feature of objects or laws that does not change if scales of length, energy, or other variables are multiplied by a common factor. The technical term for this process is known as dilation. A feature point is said to have rotation invariant if its value does not change when arbitrary rotations are applied to its argument.

Image moments can simply be described as some function of the image pixel intensity. Pixel intensity is nothing but the pixel color value. Moments are described with respect to their power as in raised-to-the-power in mathematics. This research calculates zeroth moment, first moment, second moment, and so forth from raw moments. Later this research transformed these moments into translational, scale, and rotation invariant. This research presents the following organized structure to calculate moments shown in Figure 4.

3.1.1. Raw Moments

Before finding central moments it is necessary to find raw moments of in frame video sequence. If and are the components of the centroids, raw moments of for can be defined as In case of considering as a 2D continuous function, (1) can be expressed as where centroid coordinates are as follows:

3.1.2. Central Moments

Using centroid coordinates, central moments for can be defined as

(1) Translation Invariant. To make translation invariant, can be defined as Equation (5) is central moments of which is translation invariant and derived from raw moments.

(2) Scale Invariant. Let be the scaled image of scaled by . So, where From (2), the following equation can be written:

For ; and assuming total area to be 1, (8) becomes Putting into (8) becomes , Equation (10) is scale invariant central moments of .

(3) Rotation Invariant. Let be the new image of rotated by . So, Using variable transformation, Using (2), So rotation invariant can be expressed as Here or (19) is the rotation invariant central moment measurement of .

Output of calculated moments proposed in this research is depicted in Figures 5(a)5(c). In the search window, the higher the moments value, that is, first moments , more higher probability that search window contains moving object.

Before going to newly proposed SUED algorithm using DMM model, input frame needs to be projected on the search (black box) only to reduce the risk of extraction failure. This research emphasizes that using a search window around the original object is very important. It limits the scope of segmentation to a smaller area. This means that the probability of the extraction is getting lost because a similar coloured object in the background is reduced. Furthermore, limited area increases the processing speed, making the extraction very fast.

3.2. SUED Segmentation Algorithm

This research used morphological dilation to ensure that extracted moving region contains moving objects. The output of morphological dilation is an edged image. First each frame is decomposed into uniform and nonoverlapped blocks with the size of as shown in Figure 6(c). Figures 6(a) and 6(b) show original images at frames 101 and 102 achieved after DMM approach.

Let be the original frame at frame in a video sequence, where denotes a pixel position in the original frame, and let be the corresponding decomposed image, where denotes block position for highest feature density area in the decomposed image which ensures the robustness to noise while the feature makes each block sensitive to motion. is defined by the following equation: where is feature densed block; is the random constant smaller than 1; is mean gray level of all pixels within the block at frame ; is the number of pixels with gray levels greater than ; is the number of pixels with gray levels smaller than . From (20), difference image of two consecutive block images is obtained by where is quantized image after rounded operation; is maximum value in . Using (21) the resultant difference is shown in Figures 7(a) and 7(b).

is filtered by median filter. is obtained by following formula: where = (Mean of all blocks in at time ) + Positive weighting parameter (Largest peak of histogram of -Largest peak of histogram of ). Binary image is obtained by the following condition depicted in Figure 4(a): may have discontinuous boundaries and holes. To ensure that contains moving object, edge based morphological dilation is proposed here. represents the edge image that can be obtained by gradient operator like Sobel operator. is the edge based morphological dilation of the image. is obtained by the following formula: where is the structuring element that contains elements , .

Equation (24) is considered as edge based dilation that is expected to avoid undesired regions to be integrated in the result according to edge based morphological dilation characteristics as shown in Figure 8(d). Here is the conventional dilation after obtaining which can be defined from as shown in Figure 8(b).

This research proposed the following segmentation approach: segmentation using edged based dilation (SUED) algorithm for moving object extraction from UAV aerial images after extracting frame from DMM approach.(1)Start(2) image between 2 frames, at and .(3)(4)(5)(6)(7)End.

4. Experiment and Discussion

For the experiment purpose this research used IMAGE PROCESSING LAB (IPLAB) available at http://www.aforge.net/ which is embedded with Visual Studio 2012 using C sharp programming language. Proposed experimental analysis evaluates dynamic motion modeling (DMM) first and then experimented the proposed SUED embedded with DMM to extract moving object from UAV aerial images.

Let be the labeled result of the SUED segmentation algorithm shown in Figure 8(e). Each region in this image indicates coherence in intensity and motion. Moving objects are discriminated from backgrounds by the fusion module that combines DMM (dynamic motion module) and SUED (segmentation using edge based dilation). Let be a SUED region in and the corresponding area. Let be the area of union of the , and the coverage ratio is defined as If the value of is greater than a given fusion threshold , then is considered as foreground, otherwise background. In general, threshold varies for different sequence. In this research, threshold is always set to 0.99. It is very close to 1 and does not need to be adjusted as obtained area can contain complete desired object. There may be some smaller blobs in the resulting . Those regions with the areas smaller than the threshold are considered as noisy regions. Figure 8(e) shows the extracted moving object as the resultant of fusion for DMM and SUED methodology.

4.1. Datasets

This research used two UAV video datasets (actions1.mpg and actions2.mpg) from Center for Research in Computer Vision (CRCV) in University of Central Florida (http://crcv.ucf.edu/data/UCF_Aerial_Action.php). These video datasets were obtained using R/C controlled blimp equipped with HD camera. The collection represents a diverse pool of action features at different heights and aerial view points. Multiple instance of each action was recorded at different flying altitudes which ranged from 400 to 500 feet and was performed with different actors.

4.2. Results

This research extracted 395 frames using 1 frame/second video frame rate from actions1.mpg video datasets and 529 frames using same frame rate from actions2.mpg video datasets. Frame size is . Figures 6(a) and 6(b) show two consecutive frames from (101th and 102th frames) after DMM step from actions1.mpg. Figures 7(a) and 7(b) show pixel structures of ; Figure 8(a) shows ; Figure 8(b) shows conventional dilation of , while Figures 8(c) and 8(d) show edge based dilation. Finally Figure 8(e) shows resultant of the proposed DMM and SUED based moving object detection.

4.2.1. DMM Evaluation

Figures 1(a), 1(b), and 1(c) show three search windows which are depicted as the black rectangle on selected frame in time, . The brighter the moments value for pixels, the higher the probability that search windows belong to the moving object. In Table 2 measurement of moments is shown.

In Table 2, search window 2 shows the highest moment quantity among 3 search windows which indicates that search window 2 contains the highest probability to contain moving objects. Figure 9 shows 3D line chart to depict the highest probability of pixels intensity in search window 2. This research used search window 2 as based on the probability of the highest moments distribution in the search windows to extract moving object using SUED, because it limits the scope of segmentation within smaller area, reduces complexity, and reduces time. This means that the probability of the extraction is getting lost because a similar coloured object in the background is reduced. Furthermore limited area increases the processing speed by making the extraction faster.

4.2.2. SUED Evaluation

This section presents some of the experimental analysis and results for the proposed SUED algorithm. The evaluation of the proposed approach was tested on actions1.mpg and actions2.mpg video analysis. In order to evaluate SUED algorithm, two metrics, detection rate (DR) and the false alarm rate (FAR), are defined. These metrics are based on the following parameters.(i)True positive (TP): detected regions that correspond to moving object.(ii)False positive (FP): detected regions that do not correspond to a moving object.(iii)False negative (FN): moving object not detected.(iv)Detection rate or precision rate, .(v)False alarm rate or recall rate, .From dataset actions1.mpg, this research extracts 395 frames using 1 frame per second, and from actions2.mpg, this research extracted 529 frames using the same frame rate. Details of measurement for true positive (TP), false positive (FP), false negative (FN), detection rate (DR), and false alarm rate (FAR) are mentioned in Table 3.

Detection rate increases if the number of input frames is increased. The detection rate of the given total frame for two video datasets is displayed in Figure 10.

The false alarm rate for the given number of frame from two video datasets is given in Figure 11.

Recall and precision rate (RPC) characterizing the performance of the proposed research are given in Figure 12.

This research measures detection and false alarm rate based on the number of frames extracted from each video dataset input. Compared with [1, 15, 36, 40], this research used frame difference approach where moments feature for motion modeling was not included. This research achieved detection rate of 79% (for video datasets actions2.mpg) which is a good indication to handle motion of moving object in future research. Thus, the proposed DMM model is helpful to reduce segmentation task by providing the highest probability area to be segmented using the proposed SUED instead of segmenting the whole area for the given frame by increasing processing speed.

5. Conclusion

The primary purpose of this research is to apply moments based dynamic motion model under the proposed frame difference based segmentation approach which ensures that robust handling of motion as translation invariant, scale invariant, and rotation invariant moments value is unique. As computer vision leverages probability theory, this research used moments based motion analysis which provides search windows around the original object and limits the scope of SEUD segmentation to a smaller area. This means that the probability of extraction is getting lost because a similar colored object in the background is reduced. Since moments are the unique distribution of pixel intensity, so experimental result of the proposed DMM and SUED is very promising for robust extraction of moving object from UAV aerial images. Judging from the previous research in computer vision field, it is certain that the proposed research will facilitate UAV operator or related researchers for further research or investigation in areas where access is restricted or rescue areas, human or vehicle identification in specific areas, crowd flux statistics, anomaly detection and intelligent traffic management, and so forth.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research is supported by Ministry of Higher Education Malaysia research Grant scheme of FRGS/1/2012/SG05/UKM/02/12 and ERGS/1/2012/STG07/UKM/02/7.