Abstract

We propose an algorithm to handle the problem of detecting abnormal events, which is a challenging but important subject in video surveillance. The algorithm consists of an image descriptor and online nonlinear classification method. We introduce the covariance matrix of the optical flow and image intensity as a descriptor encoding moving information. The nonlinear online support vector machine (SVM) firstly learns a limited set of the training frames to provide a basic reference model then updates the model and detects abnormal events in the current frame. We finally apply the method to detect abnormal events on a benchmark video surveillance dataset to demonstrate the effectiveness of the proposed technique.

1. Introduction

Visual surveillance is one of the major research areas in computer vision. In a crowd image analysis problem, the scientific challenge includes abnormal events detection. For instance, Figure 1(a) illustrates a normal scene where the people are walking. In Figure 1(b), all the people are suddenly running in different directions. This dataset imitates panic-driven scenes.

Trajectory analysis of objects was described in [13]. The moving object was labeled by a blob in consecutive frames, and then a trajectory was produced. The deviation from the learnt trajectories was defined as abnormal events. Tracking based approaches are suitable for the sparse scenes with a few objects. The target might be lost due to occlusion.

In [4, 5], abnormal detection approaches which used features encoding motion, texture, and size of the objects were introduced. Local image regions in a video were analyzed by employing background subtraction method; then a dynamic Bayesian network (DBN) was constructed to model normal and abnormal behavior, and finally a likelihood ratio test was applied to detect abnormal behaviors. In [6], a space-time Markov random field (MRF) model which detected abnormal activities in a video was proposed, mixture of probabilistic principal component analyzers (MPPCA) was adopted to model local optical flow. The prediction is based on probabilistic assumption techniques where an accurate model exists, but there are various situations where a robust and tractable model cannot be obtained; model-free methods are needed to be studied.

Spatiotemporal motion features described by the context of bag of video words were adopted to detect abnormal events. In [7], the authors presented an algorithm which monitored optical flow in a set of fixed spatial positions, and constructed a histogram of optical flow. The likelihood of the behavior in a new coming frame concerning the probability distribution of the statistically learning behavior was computed. If the likelihood fell below a preset threshold, the behavior was considered as abnormal. In [8], irregular behavior of images or videos was detected by an inference process in a probabilistic graphical model. In [9, 10], the video pixels were densely sampled to form the feature. These methods are based on the partial information of images, such as small blocks in a frame, without fully exploiting the global information of the feature. In [1113], spatiotemporal features modeled motion regions of the frame as background, and anomaly was detected by subtracting the newly sample to the background template. These works are similar to the change detection method when the background is not stable.

In this paper, the proposed algorithm is composed of two parts. Firstly, a covariance feature descriptor is constructed over the whole video frame, and then a nonlinear one-class support vector machine algorithm is applied in an online fashion in order to detect abnormal events. The features are extracted based on the optical flow which presents the movement information. Experiments of real surveillance video dataset show that our online abnormal detection techniques can obtain satisfactory performance. The rest of the paper is organized as follows. In Section 2, covariance matrix descriptor of motion feature is introduced. In Section 3, the online one-class SVM classification method is presented. In Section 4, two abnormal detection strategies based on online nonlinear one-class SVM are proposed. In Section 5, we present results of real-world video scenes. Finally, Section 6 concludes the paper.

2. Covariance Descriptor of Frame Behavior

The optical flow is a feature which presents the direction and the amplitude of a movement. It can provide important information about the spatial arrangement of the objects and the change rate of this arrangement [14]. We adopt Horn-Schunck (HS) optical flow computation method in our work. The optical flow of the gray scale image is formulated as the minimizer of the following global energy functional: where is the intensity of the image, , , and are the derivatives of the image intensity value along the , , and time dimension, and are the components of the optical flow in the horizontal and vertical direction, and represents the weight of the regularization term.

We introduce the covariance matrix encoding the optical flow and intensity of each frame as the descriptor to represent the movement. The covariance feature descriptor is originally proposed by Tuzel et al. [15] for pattern matching in a target tracking problem. The descriptor is defined as where is the color information of an image (which can be gray, RGB, HSV, HLS, etc.), is a mapping relating the image with the th feature from the image, is the dimensional feature extracted from image , and are the image width and image height, and is the number of chosen features. For each frame, the feature can be represented as covariance matrix: where is the number of the pixels sampled in the frame, is the feature vector of pixel , is the mean of all the selected points, and is the covariance matrix of the feature vector . The covariance descriptor of each frame dose not have any information regarding the sample ordering and the number of points [15]. Because the feature can be designed as different approaches to fuse features, the covariance matrix descriptor proposes a way to merge multiple parameters. Different choices of feature vectors extraction are shown in Table 1, where is the intensity of the gray image, and are horizontal and vertical components of optical flow, , , and are the first derivatives of the intensity, horizontal optical flow, and vertical optical flow in the direction respectively, , , and are the first derivatives of the corresponding feature in the direction and respectively, , , and are the second derivatives in direction, , , and are the second derivatives in direction. The flowchart of covariance matrix descriptor computation is shown in Figure 2. The optical flow and corresponding partial derivative characterize the interframe information or can be regarded as the movement information. The intensity of the frame and partial derivative of the intensity describe the intraframe information; they encode the appearance information of the frame.

If proper parameters are given, the traditionally used kernel, such as Gaussian, polynomial, and sigmoidal kernel, has similar performances [19]. Gaussian kernel is chosen for our spatial features. The covariance matrix is an element in Lie group; the Gaussian kernel on the Euclidean spaces is not suitable for the covariance descriptors. The Gaussian kernel in Lie Group is defined as [20, 21]: where and are matrices in Lie Group .

3. Online One-Class SVM

The essence of an abnormal detection problem is that only normal scene samples are available. The one-class SVM framework is well suitable to an abnormal detection problem. Support vector machine (SVM) is initially proposed by Vapnik and Lerner [22, 23]. It is a method based on statistical learning theory and has fine performance to classify data and recognize patterns. There are two frameworks of one-class SVM, one is support vector data description (SVDD) which is presented in [24, 25] and the other is -support vector classifier (-SVC) introduced in [26]. The SVDD formulation is adopted in our work. It computes a sphere shaped decision boundary with minimal volume around a set of objects. The center of the sphere and the radius are to be determined via the following optimization problem: where is the number of training samples and is the slack variable for penalizing the outliers. The hyperparameter is the weight for restraining slack variables; it tunes the number of acceptable outliers. The nonlinear function maps a datum into the feature space ; it allows to solve a nonlinear classification problem by designing a linear classifier in the feature space . is the kernel function for computing dot products in , . By introducing Lagrange multipliers, the dual problem associated with (6) is written by the following quadratic optimization problem: The decision function is

For the large training data, the solution cannot be obtained easily, and an online strategy to train the data is used in our work. Let denotes a sparse model of the center by using a small subset of available samples which is called dictionary: where , and let denote the cardinality of this subset . The distance of a mapped datum with respect to the center can be calculated by A modification of the original formulation of the one-class classification algorithm that consists of minimizing the approximation error is [27, 28] The final solution is given by where is the Gram matrix with ()th entry and is the column vector with entries , .

In the online scheme, at each time step there is a new sample. Let denote the coefficients, denote the Gram matrix, and denote the vector, at time step . A criterion is used to determine whether the new sample can be included into the dictionary. A threshold is preset, for the datum at time step , the coherence-based sparsification criterion [29, 30] is

First Case  . In this case, the new data is not included into the dictionary. The Gram matrix . changes online: where is the column vector with entries , .

Second Case  . In this case, the new data is included into the dictionary . The Gram matrix changes: By using Woodbury matrix identity can be calculated iteratively: The vector is updated from , with Computing as (19) needs to save all the samples in memory. For conquering this issue, it can compute as by considering an instant estimation. The update of from is Based on (20), we have an online implementation of the one-class SVM learning phase.

4. Abnormal Events Detection

In an abnormal event detection problem, it is assumed that a set of training frames (the positive class) describing the normal behavior is obtained. The general architectures of abnormal detection are introduced below.

The offline training strategy refers to the case where all the training samples are learnt as one batch, as shown in Figure 3(a). We propose two abnormal detection strategies; the difference between these two strategies is the time when the dictionary is fixed. These two strategies are shown in Figures 3(b) and 3(c). Strategy 1 is shown in Figure 3(b). The training data are learnt one by one. When the training period is finished, the dictionary and the classifier are fixed. Each test datum is classified based on the dictionary. Figure 3(c) illustrates Strategy 2. The training procedure is the same as Strategy 1. But in the testing period, the dictionary is updated if the datum satisfies the dictionary update condition. The details of these two strategies are explained in the following.

4.1. Strategy 1

In Strategy 1, the dictionary is updated merely through the training period.

Step 1. The first step is calculating the covariance matrix descriptor of training frames based on the image intensity and the optical flow. This step can be generalized as where are the image intensity and the corresponding optical flow of the 1st to th frame. are the covariance matrix descriptors.

Step 2. The second step consists of applying one-class SVM on the small subset of extracted descriptor of the training normal frames to obtain the support vectors. Consider a subset , of data selected from the full training sample set , without loss of generality, and assume that the first examples are chosen. This set of examples is called dictionary : where the set is the first covariance matrix descriptors of the training frames; it is the original dictionary . In one-class SVM, the majority of the training samples do not contribute to the definition of the decision function. The entries of a monitory subset of the training samples, , , are support vectors contributing to the definition of the decision function.

Step 3. After learning the dictionary which includes the first , samples, the training samples are learned online via the technique described in Section 3. This step can be generalized aswhere is the dictionary obtained through Step 2, is a new sample in the remaining training dataset. According to the criterion introduced in Section 3, if the new sample satisfies the dictionary updated condition, it will be included into the dictionary .

Step 4. Based on the dictionary and the classifier obtained from the training frames, the incoming frame sample is classified. The workflow of Strategy 1 is shown in Figure 4 and described by the following equation: where is the covariance matrix descriptor of the th frame needed to be classified and and are the samples of the dictionary . “” corresponds to the normal frame; “” corresponds to the abnormal frame.

4.2. Strategy 2

In this strategy, the dictionary is updated through both training and testing periods. The feature extraction step (Step 1) and the online training steps (Steps 2 and 3) are the same as the ones presented in Strategy 1. The testing step is different. The new coming datum which is detected as normal but satisfies dictionary update condition should be included into . The dictionary is needed to be updated through the testing period to include new samples.

Step  4: Strategy 2. If the incoming frame sample is classified as normal (), the data is checked by the criterion described in Section 3. When the data satisfies the dictionary update criterion, this testing sample will be included into the dictionary. This step can be generalized by the following equation:

5. Abnormal Detection Result

This section presents the results of experiments conducted to analyze the performance of the proposed method. A competitive performance through both Strategy 1 and Strategy 2 of UMN [31] dataset is presented.

5.1. Abnormal Visual Events Detection: Strategy 1

The results of the proposed abnormal events detection method via Strategy 1 online one-class SVM of UMN [31] dataset are shown below.

The UMN dataset includes eleven video sequences of three different scenes (the lawn, indoor, and plaza) of crowded escape events. The normal samples for training or for normal testing are the frames where the people are walking in different directions. The samples for abnormal testing are the frames where people are running. The detection results of the lawn scene, indoor scene, and plaza scene are shown in Figures 5, 6, and 7, respectively. Gaussian kernel for the Lie Group is used in these three scenes. Different values of and penalty factor are chosen; the area under the ROC curve is shown as a function of these parameters [32]. The results show that taking covariance matrix as descriptor can obtain satisfactory performance for abnormal detection. And also, training the samples online can obtain similarly detection performance as training all the samples offline. Online one-class SVM is appropriate to detect abnormal visual events. There are frames in the lawn scene, and normal frames are used for training. In the offline strategy, all the frames covariance matrices should be saved in the memory. In Strategy 1, 100 frames covariance matrices are considered as the dictionary firstly. When feature is adopted to construct the covariance descriptor, the variance of Gaussian kernel is , the preset threshold of the criterion is , the dictionary size increases from to , and the maximum accuracy of the detection results is . In the indoor scene, there are 2975 normal frames and 1057 abnormal frames. In the plaza scene, there are 1831 normal frames and 286 abnormal frames. The processes of the experiments are similar to the ones of the lawn scene. When feature vector is , , , the dictionary size of these two scenes remain . The online strategy keeps the memory size almost unchanged when the size of training dataset increases.

5.2. Abnormal Visual Events Detection: Strategy 2

The results of the abnormal event detection method via Strategy 2 of UMN dataset are shown as follows. In the experiment process of the lawn scene, 100 normal samples from the training samples are learnt firstly, and then the other 380 training data are learnt online one by one. After these two training steps, we can obtain the basic dictionary from the training samples and also the classifier. In the following testing step, the dictionary is updated if the sample satisfies the dictionary update criterion. When a new sample is coming, it is firstly detected by the previous classifier. If it is classified as anomaly, the dictionary and the classifier are not changed. Otherwise, if the sample is classified as a normal one, the sparse criterion introduced in Section 3 is used to check the correlation between the earlier dictionary and this new datum. It will be included into the dictionary when it satisfied the update condition. The dictionary will be updated through the whole testing period. The other two scenes, the indoor and plaza scene, are handled by the same methods. When feature is adopted, the variance of the Gaussian kernel is , and the preset threshold of the criterion is , and the dictionary size of the lawn, indoor, and plaza scene is increased from 100 to 106, 102, and 102, respectively. The ROC curve of detection results of these three scenes is shown in Figures 8(a), 8(b), and 8(c). Besides the merit of saving memory of Strategy 1, Strategy 2 also has the advantage of adaptation to the long duration sequence.

The results performances of offline strategy, Strategy 1, and Strategy 2 are shown in Table 2. The performances of these two strategies results are similar to that of the results when all training samples are learnt together. When or are chosen as the features to form covariance matrix descriptor, the results have the best performance. These two features are more abundant to include movement and intensity information.

The result performances of the covariance matrix descriptor based online one-class SVM method and the state-of-the-art methods are shown in Table 3. The covariance matrix based online abnormal frame detection method obtains competitive performance. In general, our method is better than others except sparse reconstruction cost (SRC) [17] in lawn scene and indoor scene. In that paper, multiscale HOF is taken as a feature, and a testing sample is classified by its sparse reconstructor cost, through a weighted linear reconstruction of the overcomplete normal basis set. But computation of the HOF might takes more time than calculating covariance. By adopting the integral image [15], the covariance matrix descriptor of the subimage can be computed conveniently. So the covariance descriptor can be appropriately used to analyze the partial movement.

6. Conclusions

A method for the abnormal event detection of the frame is proposed. The method consists of covariance matrix descriptor encoding the movement features, and the online nonlinear one-class SVM classification method. We have developed two nonlinear one-class SVM based abnormal event detection techniques that update the normal models of the surveillance video data in an online framework. The proposed algorithm has been tested on a video dataset yielding successful results to detect abnormal events.

Acknowledgments

This work is partially supported by China Scholarship Council of Chinese Government, SURECAP CPER Project funded by Région Champagne-Ardenne, and CAPSEC CRCA FEDER platform.