Abstract

Over the past decades, crowd management has attracted a great deal of attention in the area of video surveillance. Among various tasks of video surveillance analysis, crowd motion analysis is the basis of numerous subsequent applications of surveillance video. In this paper, a novel social force graph with streak flow attribute is proposed to capture the global spatiotemporal changes and the local motion of crowd video. Crowd motion analysis is hereby implemented based on the characteristics of social force graph. First, the streak flow of crowd sequence is extracted to represent the global crowd motion; after that, spatiotemporal analogous patches are obtained based on the crowd visual features. A weighted social force graph is then constructed based on multiple social properties of crowd video. The graph is segmented into particle groups to represent the similar motion patterns of crowd video. A codebook is then constructed by clustering all local particle groups, and consequently crowd abnormal behaviors are detected by using the Latent Dirichlet Allocation model. Extensive experiments on challenging datasets show that the proposed method achieves preferable results in the application of crowd motion segmentation and abnormal behavior detection.

1. Introduction

With the rapid increasing demand of public safety and security, crowd management is becoming one of the most challenging tasks of government department. It has been reported that there were 215 human stampede events worldwide resulting in 7069 deaths and over 14000 injuries between 1980 and 2007. Manually analyzing surveillance video is obviously time-consuming. Moreover, nowadays high-throughput video surveillance systems include a great amount of image data, which makes manual analysis virtually impossible. Therefore, how to model crowd motion plays a significant role in video surveillance system.

Crowd motion analysis has been intensively studied in computer vision fields; the current methods can be roughly divided into three categories from a high level perspective: microscopic approach, macroscopic approach, and hybrid methods. Microscopic approach needs to analyze pedestrian individuals’ motivation in the video and then spread individual behavior to crowd motion. Most of these methods need to firstly detect the movement of pedestrian individual or pedestrian groups and then analyze local crowd motion behavior [14]. Helbing et al. firstly introduced social force model to simulate pedestrian dynamics. A few methods used social force model to detect and localize unusual events in crowd video [57]. Recently, some researchers adopt trajectory-based approach to analyze crowd behavior. Zhou et al. [8] and Bae et al. [9] modeled crowd behavior by extracting individual pedestrian’s trajectories which were obtained by KLT tracker [10]. However, object tracking still has some unsolved challenges, such as occlusion or pedestrians entering/existing.

Macroscopic approach reviews crowd as an entity instead of modeling the motion of individual. This kind of algorithms usually takes optical flow to capture the crowd motion dynamic [1113]. However, optical flow is instantaneous information between consecutive frames, and it fails to capture the motion by itself in a long-range time. A few researchers modeled fluid dynamic to characterize global crowd behaviors. Ali and Shah constructed a Finite Time Lyapunov Exponent (FTLE) field to segment crowd flow [14], but the segmented regions were unstable when the crowd density was not very high or crowd flow varied quickly.

Hybrid methods model crowd motion from macroscopic view as well as microscopic view. Hybrid models need to combine benefits of both macro and micro approaches. Pan et al. firstly studied human and social behavior from the perspectives of human decision-making and social interaction for emergency egress analysis [15]. Borrmann et al. presented a bidirectional coupling technique to predict pedestrian evacuation time [16]. Recently, Bera and Manocha modeled crowd motion based on discrete and continuous flow models; the discrete model is based on microscopic agent formulation and the continuum model analyzes macroscopic behaviors [17]. Inspired by the above hybrid method, we model crowd motion pattern by the combination of a macroscopic and a microscopic model of crowd dynamics. The macroscopic motion model is derived from streak flow representation which can depict motion dynamic over an interval of time [18]. And the microscopic motion model is based on the social force between the particle groups of crowd sequence.

The general flow chart of our proposed method to model crowd motion and detect abnormal behavior is illustrated in Figure 1. We first segment crowd sequence into spatiotemporal patches based on the similarity of streak flow. Then, we construct weighted social force graph by particle adventure scheme [14] and cluster the crowd video into particle groups to represent different motion patterns of crowd. Finally, we use Bag of Words (BoW) to generate codebook and then analyze crowd motion behavior based on Latent Dirichlet Allocation (LDA) [19] model. We evaluate our method on three public datasets and perform comparison with a number of state-of-the-art methods. The experimental results illustrate that the proposed method can achieve preferable results in application of crowd motion segmentation and crowd abnormal behavior detection.

In summary, the main contributions of this paper include the following: () a proposed framework to analyze crowd motion behavior which takes the global spatiotemporal motion and local pedestrian interaction into account; () construction of weighted social force graph to extract local motion pattern based on multiple social properties; () utilization of the LDA model to represent the relationship between the distribution of codebook and the crowd behavior pattern.

The remainder of this paper is organized as follows. Section 2 describes how to extract streak flow which can integrate the optical flow over an interval of time. Section 3 provides details on the construction of weighted social force graph for extracting crowd local motion. Section 4 demonstrates the strength of our method in application of crowd motion segmentation and crowd abnormal behavior detection. The experimental results from different data sets and the comparison results with existing methods are shown in Section 5. Finally, Section 6 includes the conclusion and potential extensions of our paper.

2. Streak Flow Representation of Crowd Motion

Streak flow is defined as an accurate representation of motion field of crowd video [18]. Different from optical flow which only provides motion information in a small time interval, streak flow can provide temporal evolution of motion objects in a period of time. Our approach of computing streak flow is similar to that proposed in [18]. However, for completeness, we will now shortly introduce the steps of this method. Firstly, we initialize regular particles at time and then update the position of particles according to the optical flow at instant time. Let () denote the position of particle at the instant time , which is initialized at point and frame for The position of particle at any instant time is denoted as [18]where and represent the velocity field of particle at time . This will generate a series of curves which start from particle . For steady motion flow, these curves are the same as the paths of particles, but they will vary dramatically in unsteady flow. In order to fill the gaps in unsteady flow, we propagate the particles backward based on current flow filed using fourth-order Runge-Kutta-Fehlberg algorithm [14].

However, the positions of particles have usually subpixel accuracy. In order to represent streak flow of every pixel, we utilize the three nearest neighbor particles to compute the streak flow by linear interpolation. Here, we describe the computation of ; the method of estimating is the same. We use to represent the streak flow of each particle in vertical direction. Then, we definewhere is the index of the neighbor pixels and is the known triangulation function [18] for the th neighbor pixel. Applying formula (2) on all of the data of , we can obtain the following equation:We resolve the above equation and obtain the value of . Figure 2 illustrates the difference between optical flow and streak flow. We can see that streak flow can better depict the motion changes.

3. Social Force Graph for Crowd Motion Analysis

The aforementioned streak flow represents particle motion in a period of time. However, in the process of crowd behavior analysis, we are concerned more about the motion of particle groups than a particular particle. Most of the existing methods extract group motion by clustering spatiaotemporal feature. However, the interaction between particles plays an important role in crowd motion fluid. The particular pedestrian determines and changes his own motion state according to the motion of his neighbor and the global motion in his perspective. Recently, social force model is adapted to simulate the dynamic of pedestrian’s movement [1, 5, 7, 20, 21]. In some crowd applications, social force model has achieved satisfying performance.

Based on the above motivation, we propose the idea of motion particle group representation by social force graph with streak flow attribution. We use streak flow to represent the global motion state in a period of time; meanwhile, we extract the social interaction force between particles to represent the influence of individual movement and then construct a weighted social force graph to extract particle groups revealing the local crowd motion patterns.

3.1. Partition Spatiotemporal Analogous Patches

In order to model crowd motion, we first partition crowd sequence into analogous patches which have similar spatiotemporal features. Specifically, we generate analogous patches by clustering the pixels with similar motion features. For a specific image in crowd sequence, point represents any pixel of the image. We define the similarity of different pixels based on the streak flow similarity and distance similarity.

For any two pixels and , the distance similarity is defined asIn order to describe similarity of motion direction, we divided the whole direction plane into sixteen equal folds which are denoted as . The motion direction of any pixel is quantized into one of the sixteen bins based on the streak flow direction. Then, the direction similarity of two pixels is defined asAnd motion magnitude similarity of pixels is denoted aswhere is the average magnitude of crowd streak flow and and are the streak flow of pixel in horizontal direction and vertical direction.

In order to partition crowd sequence into spatiotemporal analogous patches , we firstly divide the crowd image into regular patches and then put seeds on each regular patch. The process of partition is to iteratively find the nearest seed for each pixel and then add the pixels into the patch where the nearest seed belongs. In each iteration process, we update the position of seed as the center of the patch. The determination of nearest seed is based on minimizing spatiotemporal motion similarity and spatial distance aswhere is any pixel in patch , is the center pixel of , is the number of spatiotemporal analogous patches specified in our experiment, and the parameters , , and are balance conditions according to different crowd sequence.

Figure 3 illustrates an example of the analogous patches based on the above method. We overlap the partition patches on the original image. The result shows that our partition method based on spatiotemporal similarity of particles can capture the change of motion objects. We can see that the pixels with similar motion pattern tend to be divided into the same patch, for example, the patches with labels 1, 2, and 3.

3.2. Social Force Graph for Particle Groups

Spatiotemporal analogous patches reflect the global visual similarity of moving pedestrian, but these patches neglect the interaction behavior between pedestrians. Here we describe how to construct a weighted social force graph to model the group pattern of crowd, where is a set of vertices and is a set of edges. In this paper, the vertices set represents the set of regular particles in particle adventure scheme. The edges set measures the individuals’ motion similarity from the respective interaction social force. By the graph , we can cluster similar particles together into particle groups to form the basis of crowd motion behavior analysis.

We need firstly to compute the interaction social force of particles. Interaction social force model describes crowd behavior by considering the interaction of individuals and the constraints of environment. The model is described as [5]where is the interaction social force of particle , denotes the personal desired velocity, and indicates the actual velocity of individual. In this paper, is represented as the streak flow of particle defined in Section 2. And is defined aswhere is the streak flow of the particle and is the average streak flow of all particles in patch where the particle belongs.

As shown in [5], any high magnitude of interaction social force implicates that particle motion is different from the collective movement of the crowd. However, here interaction force is just only the motion information of one particle. In real-life crowd analysis, we usually need to use particle groups with similar motion patterns to characterize crowd behavior. Therefore, in this paper, we utilize a weighted social force graph to extract different particle groups by considering multiple factors, for example, distance factor and the initial analogous patches factor. For the weighted graph , the weight of edge from particle to particle is used to measure the dissimilarity of different particles. And the value of is defined aswhere and are the weighting factors based on distance factor and analogous patch factor, respectively. The smaller the value of is, the more similar the two particles are.

is used to measure the influence of distance, and we think the motion similarity of different particles is inversely proportional to the distance between particles, which means the influence of particle on neighbor particles is larger than on far particles.

Based on the above observation, is defined aswhere is the influential factor and is the Euler distance between particles. is used to judge whether the two particles belong to the same initial analogous patch extracted in Section 3.1. If the two particles are in the same patch, then the value of will be zero; otherwise, the value of is one.

We use the clustering method [22] to extract particle groups to indicate similar motion patterns in a period of time. The details of obtained particle groups will be described in Section 4.

4. Application of Social Force Graph with Streak Flow Attribute

Using the weighted social force graph with steak flow attribute, we demonstrate the strength of our method for crowd motion segmentation and crowd abnormal behavior detection.

4.1. Crowd Segmentation

Based on the social force graph, we can obtain particle groups which have the similar motion patterns. In the graph , the values of edge weights indicate the dissimilarity of motion pattern between particles. As shown in [22], a greedy method to segment image into regions was presented based on the intensity difference across the region boundary. Motivated by this method, in this paper, we firstly define a predicate value to represent graph region boundary and then segment the graph into different components. Each component of the graph has a smaller motion difference; therefore, different components represent the dissimilar particle groups. In order to measure the dissimilarity of two components, we utilize to measure the motion difference of all particles in a component, and is defined aswhere is the minimum generation tree of component . We define to indicate the motion difference between different components:When the value of is smaller than the predicate value , we will merge the two components and then update the value of aswhere is a threshold function specified in the experiment based on the size of the components.

The results of crowd segmentation are presented in Section 5.

4.2. Crowd Abnormal Behavior Detection

Abnormal behavior detection is one of the most challenging tasks in the field of crowd analysis because the crowd abnormal behavior is difficult to be defined formally. Generally speaking, crowd abnormal behavior is defined as a sudden change or irregularity in crowd motion, such as the phenomenon of escape panic in public places or the dramatic increase of crowd density. In this section, we will describe how to detect crowd abnormal behavior based on our crowd motion model.

Based on aforementioned crowd segmentation, we have clustered crowd sequence into particle groups with similar motion pattern. We view each group as a visual word which is represented by interaction social force feature and the streak flow feature. And then we generate a codebook by -means scheme. The process of crowd abnormal behavior detection is regarded as the problem of word-document analysis. We adopt Latent Dirichlet Allocation [19] model to infer the distribution of visual words and then judge the emergence of abnormal event.

LDA is a well-known topic model for document analysis. It views the generation of document as the random selection of document topics and topic words. Particularly, in the crowd analysis, crowd behavior is represented as the result of random distribution of latent topics where the topics are denoted as the distribution of visual words. In this paper, we view the whole crowd video as a corpus, each crowd image as a document. Any document is modeled as the mixture of topics with the joint distribution parameter .

LDA model is trained to obtain the value of parameters and , which denotes the corresponding latent topics and the conditional probability of each visual word belonging to a given topic, respectively. During the inferential process, the distribution of topic can be inferred by LDA model. According to the value of and , the distribution of topic is denoted asCrowd abnormal behavior detection is then achieved by comparing the topic distribution of current crowd frame to the model of training data.

5. Experiments and Discussion

In this section, we will perform different experiments on some publicly available datasets of crowd videos. We apply our method to some common crowd motion analysis problems. The goals of our experiments are () to segment the crowd video sequence into different motion pattern and () to detect abnormal behavior of crowd video.

5.1. Results of Crowd Segmentation

Here we provide the results of our crowd segmentation algorithm. We use the same experiment video as that in [18], which is an intersection video at Boston. Boston video includes 531 frames, and the video has three crowd behavior phases between twice traffic light: () traffic is formed, () traffic lights change and a new traffic flow emerges, and () traffic lights change again and another flow develops. We compare the proposed method with the method in [18] (denoted as SRF) and the proposed method without initially spatiotemporal analogous patches (denoted as proposedNC).

Figure 4 presents the segmentation results of the above three methods. We describe the results of our method in each frame. In frame 32, the north vehicles are moving to south and one person is walking on the sidewalk from east to west. We can see that our proposed method correctly segments the vehicle flow (1st row, red) and the person (1st row, green). But SRF is unable to distinguish the different motion patterns of vehicle and people (3rd row, green), and the proposedNC even neglected the motion of the people (2nd row, green). In frame 146, pedestrians who are walking on the sidewalk are detected as a group (1st row, red); however, SRF views the pedestrian as different groups (3rd row, yellow and purple). In frame 212, our method correctly segments the cars having different motion directions (1st row, green and red), but SRF views them as the same group (3rd row, red). In frame 425, the bottom pedestrian flow actually contains two different motion directions: some people are walking east and some people are going straight. Our method correctly detects the two motion patterns (1st row, purple and aquamarine); SRF, however, views the two motion patterns as a group (3rd row, aquamarine). From the above experimental results, we can see clearly that our proposed method not only is able to capture the temporal changes of crowd sequence (e.g., the car flow from north to south in frame 32) but also depicts the local motion changes (e.g., the cars from different direction in frame 212 and the two motion patterns of pedestrians in frame 425).

Figure 5 shows the quantitative comparison of the three methods. In our experiment, we manually label the segmented regions frame by frame according to the actual motion scene and then count the correctly segmented regions and incorrectly segmented regions in crowd sequence. Figure 5 demonstrates that our method is inferior to the other two methods in number of incorrect segment motions, which are due to the different label method in [18] and our method. In [18], the segment region is viewed as correct segmentation when difference of direction between the object and the majority of the objects in the region is less than 90 degrees. However, in our experiment we count the segment result according to the actual direction of motion groups, which results in the numbers of incorrect segmentation sharply increased.

5.2. Results of Crowd Abnormal Behavior Detection

In this part, our method is tested on three public datasets from PETS2009 (http://www.cvg.reading.ac.uk/PETS2009/a.html), University of Minnesota (UMN) (http://mha.cs.umn.edu/proj_events.shtml#crowd), and some video collection from websites [14]. PETS2009 contains different crowd motion patterns, such as walking, running, gathering, and splitting. We select 1066 frames from “S3” sequence as our experiment object, and in this dataset, crowd movement from walking to running is defined as abnormal behavior. UMN includes eleven different kinds of escape events which are shot in three indoor and outdoor scenes. We select 1377 frames from UMN dataset in our experiment, and the abnormal behavior in this dataset is the crowd escape. The web dataset includes crowd videos captured at road intersection, which contains 3071 frames. This crowd video contains two types of motion patterns: waiting and crossing, and pedestrian crossing is considered as abnormal behavior. We also manually label all videos as ground truth. From the three datasets, we randomly select 40% frames for training and the other frames for testing. Figure 6 illustrates the sample frames of the three datasets. The first row is the sampled frames from PETS2009, the second row from UMN, and the last row from web dataset.

In the streak flow phrase, we downsampled the original pixels to fasten the process of experiment. The sample rate of the three datasets is 30%, 50%, and 20%, respectively. In order to utilize the different characteristic of the videos, the parameters in formula (7) are set to , , and . After constructing social force graph, we segment the video sequence into different motion groups, and then visual words are generated based on clustering all motion groups. The length of visual words is 32, 64, and 100, respectively. In the LDA phrase, we learned latent topics for abnormal behavior detection. We compare our method with the social force based method [5] (denoted as SFM), the streak flow based method [18] (denoted as StrFM), and the optical flow based method (denoted as OFM). In the process of comparison, we obtain visual words based on corresponding visual features and then train the words by LDA model.

Figures 79 show some sample detection results of the three datasets. In each figure, the first picture is first frame and the second picture is abnormal frame of the experimental scene. We then use color bar to represent the detected results of each frame in the crowd sequence. Different color of the bar denotes the status of current frame: green is normal frame and red is the abnormal frame. The first bar indicates the ground truth bar, and the rest of the bars show the results from our proposed method, SFM, StrFM, and OFM, respectively. All in all, these results show that our proposed method is able to detect the crowd abnormal dynamic, and in most cases, the results of our method outperform the other algorithms.

Figure 7 shows the partial results on running video sequence of PETS2009. Figure 8 illustrates some of the results on escape crowd video of UMN. Figure 9 exhibits part of the results on web video. We can see that the results of the proposed method better conform to the ground truth because we not only consider the temporal characteristic of the motion objects but also exploit the spatial social force relationship between objects. It is observed that, in Figure 7, SFM obtains an even better result compared to our proposed method because the social force is sharply increased in running scene. However, SFM results in more false detection in some crowd scene, for example, crowd walking (Figure 8) and road crossing (Figure 9), because most of the social force in these scenes is nearly zero and thus SFM faces difficulty in depicting this type of crowd motion. OFM is obviously inferior to our method because the optical flow fails to capture the motion in a period of time. In particular, in Figure 9, there is more false detection for road crossing behavior because most of the walking behavior is irregular, but optical flow fails to capture irregular motion pattern. The results of StrFM are better than OFM in the three figures; however, this method still has more false detection compared to our method and SFM. This method attains a comparable result when the motion dynamic is relatively simple (Figure 7), but it undergoes a performance descent when a substantial new flow constantly enters into the scene (Figure 9).

In order to evaluate our method quantitatively, the accuracy results of abnormal detection are shown in Table 1. The results show that our proposed model has improved nearly by 5% accuracy compared with social force model. In particular, in the higher density crowd scene, the social force of particle groups will have a slowly changing ratio; thus, the social force model is not very obvious for normal behaviors. For example, the SFM accuracy is lower than our proposed method and even lower than optical flow method in Table 1.

Figure 10 shows the ROC curves for abnormal behavior detection based on our proposed method against the alternative methods. It is observed that our method provides better results.

6. Conclusion

In this paper, we proposed a novel framework for analyzing motion patterns in crowd video. On the aspect of spatiotemporal feature, we extracted the streak flow to represent the global motion, while for the interaction of crowd objects, we constructed a weighted social force graph to extract particle groups to represent local group motion. Finally, we utilized LDA model to detect crowd abnormal behavior based on the proposed crowd model. The experimental results have shown that our proposed method successfully segments crowd motion and detects crowd abnormal behavior. And the proposed method outperformed the current state-of-the-art methods for crowd motion analysis. As part of our future work, we plan to further study the crowd motion behavior in high density scenes and increase the robustness of our algorithm.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the Research Foundation of Education Bureau of Hunan Province, China (Grant no. 13C474).