Abstract

Abnormal running behavior frequently happen in robbery cases and other criminal cases. In order to identity these abnormal behaviors a method to detect and recognize abnormal running behavior, is presented based on spatiotemporal parameters. Meanwhile, to obtain more accurate spatiotemporal parameters and improve the real-time performance of the algorithm, a multitarget tracking algorithm, based on the intersection area among the minimum enclosing rectangle of the moving objects, is presented. The algorithm can judge and exclude effectively the intersection of multitarget and the interference, which makes the tracking algorithm more accurate and of better robustness. Experimental results show that the combination of these two algorithms can detect and recognize effectively the abnormal running behavior in surveillance videos.

1. Introduction

In most of the existing video surveillance systems, moving objects only were detected and tracked, which lacked to detect and recognize their behaviors in the surveillance scene. However, the purpose of monitoring the scene is to detect and analyze the unusual event or person’s abnormal behavior in real life. In a long video sequence, such works handled manually are neither practical nor efficient, and the video surveillance system has already lost its original intention for preventing and actively intervening and almost become a tool of providing video evidence afterwards. The intelligent detection of abnormal behavior not only can detect abnormal behavior and inform the staffs to prevent illegal activities in time, but also can save a lot of storage space and avoid the staffs finding and collecting massively evidence after the illegal actions had happened.

At present, the methods of detecting abnormal behavior always have analyzed the continuous motion trail of moving object. First of all, the areas that change in the current frame have been identified, and the objects (people) of the region has been tracked in real-time. Secondly, velocity, acceleration, motion direction, and so on has been computed on the basis of the state information that was founded in each frame, and state models have been established. Finally, state parameters of the test video have been matched with the precalibrated parameters of the model in which the normal and reference sequence of events have been contained, and then abnormal events can be detected under the degree of match [1, 2]. Shape, motion and other information have been extracted from image sequence through a predetermined criteria. Based on above-mentioned information, the normal behavior model has been defined by using artificial or semisupervised methods, which usually have modeled the state represented by the features of the image sequences with HMM, and then observation has been considered as abnormal behavior if they do not match the normal behavior model [3, 4]. However, the unpredictable and infrequent characteristics of abnormal behaviors have limited the supervised learning methods, because these methods have needed a large number of training samples. The complexity of events and actions often make a simple event model not enough to express a wide range of abnormal behaviors [5, 6].

In [7, 8], video documents were divided into some segments according to some rules, then extracting features from each subvideo were composed of a vector to represent this sub-video. The method of clustering and similarity measure was adapted to those vectors, and then the behavior in a sub-video would be considered as abnormal if the subvideo had less categories. But the computation would dramatically increase as the number of categories increased. It was very difficult that identifying abnormal behavior kept real time. Jian-hao and Li [9] proposed a method to identify abnormal behavior, such as robbery, fighting and chasing, in surveillance videos. The method recognized these behaviors according to the disorder of velocity, and direction of these behaviors. However, the method could not distinguish the three abnormal behaviors. Cheng et al. [10] proposed a method to detect and describe periodic motions, which can be used to characterize periodic motion of a nonrigid moving object, such as human running behavior. Furthermore, to identify the human running behavior, they defined a descriptor derived from their periodic motion description. However, it could not classify these running behaviors.

In order to satisfy the real-time performance in the surveillance system, this paper proposes a method that detects the abnormal running behavior on the basis of spatio-temporal parameters in surveillance videos. First of all, we extract foreground objects from videos based on Mixture Gaussian Model and Frame Subtraction [11, 12] and binarize the images. The algorithm for extract foreground involves nonlinear systems [1315]. In addition, we obtain a clearer foreground image with morphological processing. Furthermore, in order to satisfy the real-time requirement, this paper presents a multitarget tracking algorithm that is based on the intersection area among the minimum enclosing rectangles, which can effectively track multiple objects in the case of shelter. Finally, the abnormal running behavior can be detected through spatio-temporal relationship. Experimental results show the effective and the real-time performance of the proposed algorithm.

This paper is organized as follows. In Section 2, the definitions of the normal running behavior and the abnormal behavior are presented. In Section 3, the method of multitarget tracking is described in detail. In Section 4, the approach to recognize abnormal running behavior is described. In Section 5, experimental results based on the surveillance video database are shown, and the conclusion is given in Section 6.

2. Definition of Running Behaviors

Abnormal running behaviors frequently happened in robbery cases and other criminal cases. In order to distinguish between the abnormal running and the normal running, we first present the definition of the two behaviors as follow.

Definition 2.1 (normal running). The object gradually accelerates from the state of walking or being stationary and then reaches even greater than the speed of normal running after a certain long time, or the object’s speed moving into the video scene is greater than the speed of normal running. We define the above-mentioned action as the Normal Running Behavior. It can be represented by the following equation:

Definition 2.2 (abnormal running). The object suddenly accelerates from the state of walk or stationary and then reaches even greater than the speed of normal running after a certain short time, which is defined as Abnormal Running Behavior. It can be written as: where , are the initial velocity and the instantaneous velocity of the interested object, respectively, and are the speed of walk and the speed of normal running, separately. In additional, t is the time interval of an object from the speed less than to the speed . When the speed , is a time threshold used to determine whether the motion of object is the normal running, is also a time threshold used to determine whether the motion of object is the abnormal running, and there is . Diagrams of the behavior are shown in Figures 1 and 2.

In Figures 1 and 2, is the start time and is the moment that the speed of moving object reaches the value. The difference between Figures 1 and 2 is that existed in Figure 1, while there is in Figure 2. From this we have two conclusions: the key to distinguish the run behavior from the nonrun behavior is the speed of moving target; while the key to differentiate normal running from abnormal running is the moment at which moving targets achieve the speed of running.

3. Target Tracking under Shelter

3.1. The Basic Idea of the Method

Between the two adjacent images, the position and the contour of the same object are only changing a little in general, so the object’s region in the two images is often intersected with each other [16, 17]. It is an important feature in continuous video sequences. The feature is exploited to track object in continuous video sequences in this paper, which is also used to detect moving objects in the algorithm of frame subtraction. In the following we will discuss the fundamental ideas of this paper in detail.

In this paper, a moving object is marked with the minimum enclosing rectangle which is represented as , and refers to coordinates of the upper left corner. refers to the width and the height of the rectangle. Thus the moving object’s centroid can be calculated as the following formula:

In accordance with (3.1) and (3.2), we can obtain the centroid of the moving object. We assume that which is represented as is a moving object in the frame. We consider that the moving object intersects with I, a moving object, in the frame if they satisfy the following formulas:

Actually the object intersects with the object if they satisfy the (3.3) and (3.4), while the object’s position does not always have much change in the two adjacent images, so that the intersection area between the rectangle of and has the property that is represented as (3.5). In this paper, and are considered as the intersection unless they meet (3.3), (3.4), and (3.5) simultaneously. Shelter often occurs in surveillance scenes, because there are always multiple moving objects, so (3.5) can exclude a small part of the shelter which always impacts on the object tracking.

The shelter often happens in multitarget tracking in which objects may be sheltered by others or themselves, or by the stationary object in the background. The degree of the shelter is always different. The object’s shelter can be divided into two stages. Firstly, the shelter occurring means that the target information is lost more and more during this period, which is shown as two or more rectangle boxes merged. Secondly, the shelter begins disappearing, and the target information is gradually restored, which are shown as the rectangle box separated into two or more rectangle boxes.

Therefore, when the block is occurring, this approach is to merge the blocked objects into a new object tracking and to record the histogram information of the sheltered objects in the previous frame. When the shelter is disappearing, to recognize the separated object, the separated rectangular box matches the recorded histogram of the tracked target.

3.2. The Exclusion of Interference

The interference in the moving object detection phase generally has two features. One is small size, and the other is the short survival time. In the former case, since we are only interested in people in video images and the people size in video images is generally not too small, so we can use a threshold to remove small object. According to data from several experimental results, we remove the object if its target area is less than 30 pixels. In the latter case, we have designed the list of temporary tracked objects, m_TempObjectList, and the list of tracked objects, m_TrackedObjectList, which are shown in Figure 3.

The node structures of the two lists are same. Each node records the corresponding history information of the moving target, such as the information of tracking process or the tracking information of behaviors analysis. These nodes are called tracked objects. There are differences in the two lists. m_TempObjectList records the moving object in scenes whose existence time does not exceed a certain threshold, and m_TrackedObjectList records the stable moving object whose existence time reaches a certain threshold.

According to many experimental results, only when the existence time of the moving object reaches 5 frames, then the moving object is inserted into m_TrackedObjectList and deleted from m_TempObjectList. This method can exclude short-term interference in the surveillance video. Meanwhile, in order to avoid the interference in which the objects appear in part, we only deal with the object that has entered completely into the scene.

3.3. The Proposed Algorithm

Algorithm 3.1. Multitarget tracking algorithm.
Input. The list of moving objects extracted from the current frame.
Step 1. We get a node from the m_TempObjectList or m_TrackedObjectList. If there are moving objects in the list of moving objects with the node that satisfies the formulae (3.3), (3.4), and (3.5), then those moving objects are recorded as associated objects in the node. The node is recorded as associated node in those moving objects too. If there are some nodes which are not handled in the m_TempObjectList or the m_TrackedObjectList, then go to Step 1. Otherwise, go to Step 2.Step 2. We get a node from the m_TempObjectList and the m_TrackedObjectList and count the number of associated objects of the node. If , then goto Step 3, else if , then go to Step 4, else go to Step 5. If every node in the m_TempObjectList and the m_TrackedObjectList has been processed, then go to Step 6.Step 3. shows the tracked objects has disappeared in the current frame, if it belongs to m_TempObjectList, then it is deleted from m_TempObjectList. Otherwise if it is inserted into m_TempObjectList and deleted from m_TrackedObjectList, go to Step 2.Step 4. means that there is only one object associated with the node. If there are more associated nodes recorded in the associated object, then the shelter algorithm will begin. Otherwise, the node is updated with the information of the associated object. Go to Step 2.Step 5. indicates that there are more associated objects, so the approach of the shelter disappearing is utilized. Go to Step 2.Step 6. If there are not objects associated with any node in the m_TempObjectList and the m_TempObjectList, then a new node is generated for the object, and the node is inserted into the m_TempObjectList. Go to Step 7.Step 7. Update the m_TempObjectList and the m_TrackedObjectList. Delete the node whose existence time is more than 5 frames from the m_TempObjectList and insert it into the m_TrackedObjectList.

Figure 4 shows the main flow chart of the algorithm. Mean shift and Particle Filter are the most popular tracking algorithms in the intelligent video surveillance system. Comparing with Mean shift tracking algorithm and Particle Filter tracking algorithm, the proposed multitarget tracking algorithm has the following advantages.(1)In the tracking result, regardless of the shelter, Particle Filter tracking results are more accurate than Mean shift, and Particle Filter is less affected by the background. Mean shift can track fast moving targets, but it is vulnerable to the background that is similar to the tracking target. And it can easily cause the vibration of the tracking window, which results in the tracking result being not stable. But our algorithm is less affected by the background as well as Particle Filter, and our algorithm can exclude two typical interferences in the surveillance video. Particle Filter and Mean shift cannot track the object which is entirely sheltered, but our algorithm can do this.(2)About the time complexity, Particle Filter is more complex than Mean shift. The time complexity of Particle Filter is , where is the number of moving objects in the current frame, and is the number of particles that distributed to moving objects [18]. The time complexity of Mean shift is , where is the average number of iterations per frame, is the number of pixel of target in the window of nuclear function, and is the cost of arithmetic operations, such as an addition operation [19]. However, the time complexity of the proposed multitarget tracking algorithm is , where is the number of moving objects in the current frame.

4. Detection of Abnormal Running Behavior

4.1. Detection of Running Behavior

According to the conclusion in the second part, the key to distinguish the running behavior or nonrunning behavior is the speed of the moving object. The instantaneous speeds of the targets can be simply obtained from , where , is the target’s centroid in the frame of and , respectively. But it has not taken into account the actual action. People may appear into the surveillance video scene from different angles, and the distance between man and the camera may be changing. In addition, the focal lengths of cameras may often vary. Although people were standing in the same position, if its location was relatively far away from the camera, his picture will be small. On the contrary it will be relatively large. Moreover, the focal length has the same impact on the picture size of man in the video images. Thus we can see that is related with the camera focal length and the distance between man and camera, for which we use the following formula to revise : where , is the enclosing rectangle of the moving object in two adjacent frames, respectively. And is a const, , is the instantaneous speed of object in the corresponding conditions of , , respectively. As is similar to , . From this we can see that if the actual speeds of an object into surveillance scene are same, is same as the target area, even if in different shooting conditions. Therefore, the speed revised by (4.1) is reliable.

However, there are many reasons leading to the instantaneous speed being not reliable. First of all, the human motions are a complex system with a high degree of freedom and nonlinear characteristics. Secondly, the position and the contour of the object will have little change between the two adjacent images in general. Nevertheless, there may be some interference in the phase of extracting moving targets. It will lead to the centroid position if the target have not changed, even in the opposite direction. It makes the centroid and the instantaneous speed not accurate with formula (3.1). To reduce the influence of the unreliable factors, we use the average speed in a short time to distinct running or not. The average speed of the targets can be obtained from , where and is frame number.

4.2. Recognition between Normal and Abnormal Running

Distinction of abnormal running is carried out under the condition of, . Only when the speed of the moving target has reached the , we determine whether the object running is normal running or not. According to Definition 2.2, if the speed of the moving target achieves the running speed, the key to distinguish whether it is abnormal running is the time of . If it is abnormal running, there are and . According to the Newton Leibniz Theorem, there are where , and are consts, so is a const too, which is abbreviated as in this paper. Therefore, it can be distinguished between abnormal and normal running by judging whether is greater than .

4.3. The Proposed Recognition Algorithm

Based on the above analysis, Definition 2.2 reduces to the following formula:

In accordance with the above formula, determining whether the target behavior is abnormal running, we only need to judge whether and of the moving target meet a certain condition. In the light of many experimental results, the threshold of and is set to 4.0 and 0.4, separately. Thus we get the criterion for detecting and recognizing abnormal.

Algorithm 4.1. recognition algorithm for abnormal running.
Input. The list of moving objects extracted from current frame.
Step 1. The moving targets in the current frame are tracked with Algorithm 3.1, and then m_TrackedObjectList is got. Go to Step 2.Step 2. Get a tracked object from m_TrackedObjectList and calculate the average speed of the object in 5 frames, if then go to Step 3, else go to Step 4.Step 3. Calculating the average acceleration of the object in 5 frames, if then the object is identified as abnormal, otherwise go to Step 4.Step 4. If there are tracked objects in the m_TrackedObjectList have not been access then go to Step 2, else end.

Figure 5 shows the main flow chart of the algorithm.

5. Experimental Results

Our algorithms are implemented using the OpenCV library with C++ interface, which has been tested and evaluated in simple surveillance scenes and complicated surveillance scenes from open surveillance datasets PETS 2007 [20]. Objects are modeled as rectangular bounding boxes with two colors. If the object is an abnormal object, then we identify it with black box and red “running” will appear on the box above, otherwise with red box, no “running”. Besides, the red font upper left corner of the image shows the frame number. and the number of the objects, what is more, the green line in the image stands for the trajectory of the objects.

5.1. The Selection of and AMIN

Figure 6(a) shows the relationship between the performance of the proposed algorithm and the value of threshold . And Figure 6(b) shows the relationship between the performance of the algorithm and the threshold value of AMIN too. According to Figure 6, we conclude that the best threshold of is 4.0, and the optimal threshold of AMIN is 0.4.

5.2. In Simple Surveillance Scenes

The first test case is to detect abnormal running in a simple scene with single person. Figure 6 shows the result. In Figure 7(a), the existence time of the target is less than 5 frame, so that its behavior is not judged in frame 425. But in Figure 7(b), the average speed of the person reaches which is more than 4.0, and the average acceleration of the person is which is more than 0.4 too, and we can see that and meet the criterion for judging abnormal running, so the person is an abnormal target in the frame of 433.

5.3. In Complicated Surveillance Scenes

Figures 8 and 9 illustrate two complicated cases of abnormal running detection, respectively. In Figure 8, although there is more than one object, but shelter did not occurred between the objects, yet shelter happened in Figure 8.

In Figure 8(a), the average speed of the person (referred to as object 1) is which is more than 4.0, but his average acceleration , which is less than 0.4, so the person 1 is not an abnormal target in the frame 65. Meanwhile, the average speed of another person (referred to as object 3) is and its average acceleration is , so object 3 is an abnormal target. Besides, the existence time of object 4 and object 5 is both less than 5, so that their behaviors are not judged in the frame 348. In Figure 8(b), the average speed of object 1 is , but its average acceleration is , so object 1 is not an abnormal target. While the average speed of object 3 is and its average acceleration is , the average speed of object 4 is and its average acceleration is , and the average speed of the object 5 is and its average acceleration is , so as we have seen, object 3, object 4 and object 5 are all identified as abnormal targets in frame 356.

In Figure 9(a), the average speed of the object 2 is , and is less than 4.0, so it is not an abnormal target. Object 3 behavior is not judged because of its existence time which is less than 5 frames. In Figure 9(b), serious shelter happened, which results in object 2 and object 3 merged into a new object 4, so we only need judge object 4, at this time the average speed of object 4 is , and its average acceleration is , so object 4 is identified as an abnormal target. Experimental results show that this algorithm can accurately detect the abnormal running behavior in different scenes.

6. Conclusion

Abnormal running frequently happened in robbery cases and other criminal cases. In order to identity such abnormal behavior in real time, this paper proposed a method on the basis of spatio-temporal parameters which can detect accurately the abnormal running. Meanwhile, to obtain precise spatiotemporal parameters and improve the real-time performance of the proposed algorithm, this paper proposed a multitarget tracking algorithm that is based on the intersection area among the minimum enclosing rectangle of the moving objects. The simple and real-time algorithm can effectively judge the intersection among objects and exclude the interference. In addition, two means of excluding interference are adopted in the multitarget tracking, which can exclude the objects which are too small or stay too short in scenes. Thus, the complexity of multitarget tracking is reduced significantly, and the accuracy is improved.

Acknowledgments

This work was partially supported by Natural Science Foundation of China under Grant no. 61170326, Natural Science Foundation of Guangdong Province under Grant no. 9151806001000011, Science & Technology Planning Project of Shenzhen City (JC200903120115A, 0015533011100512097), and Public Technical Service Platform of Shenzhen (0015533054100524069).