Abstract

Public safety issues have always been the focus of widespread concern of people from all walks of life. With the development of video detection technology, the detection of abnormal human behavior in videos has become the key to preventing public safety issues. Particularly, in student groups, the detection of abnormal human behavior is very important. Most existing abnormal human behavior detection algorithms are aimed at outdoor activity detection, and the indoor detection effects of these algorithms are not ideal. Students spend most of their time indoors, and modern classrooms are mostly equipped with monitoring equipment. This study focuses on the detection of abnormal behaviors of indoor humans and uses a new abnormal behavior detection framework to realize the detection of abnormal behaviors of indoor personnel. First, a background modeling method based on a Gaussian mixture model is used to segment the background image of each image frame in the video. Second, block processing is performed on the image after segmenting the background to obtain the space-time block of each frame of the image, and this block is used as the basic representation of the detection object. Third, the foreground image features of each space-time block are extracted. Fourth, fuzzy C-means clustering (FCM) is used to detect outliers in the data sample. The contribution of this paper is (1) the use of an abnormal human behavior detection framework that is effective indoors. Compared with the existing abnormal human behavior detection methods, the detection framework in this paper has a little difference in terms of its outdoor detection effects. (2) Compared with other detection methods, the detection framework used in this paper has a better detection effect for abnormal human behavior indoors, and the detection performance is greatly improved. (3) The detection framework used in this paper is easy to implement and has low time complexity. Through the experimental results obtained on public and manually created data sets, it can be demonstrated that the performance of the detection framework used in this paper is similar to those of the compared methods in outdoor detection scenarios. It has a strong advantage in terms of indoor detection. In summary, the proposed detection framework has a good practical application value.

1. Introduction

In recent years, with the frequent occurrences of abnormal group events such as fights, stampedes, riots, and demonstrations, video surveillance equipment has been widely used in public places such as railway stations, streets, campuses, and banks. Abnormal behavior detection using traditional video surveillance is mainly realized by manual methods. However, a long-term continuous observation often leads to staff fatigue, making them prone to missed inspections. The emergence of machine learning [13] can be exploited to realize the automatic detection of abnormal human behaviors. Compared with traditional manual methods, this method can save manpower and reduce missed detections.

With the rapid development of machine learning and deep learning, intelligent video surveillance is becoming increasingly mature. Some well-known video surveillance systems [46] have been proposed one after another and have been put on the market. The detection of human abnormal behavior using video surveillance has become an important research topic in the field of computer vision in recent years, and it has attracted widespread attention. Reference [7] summarized the recognition of different postures of the human body during activities. Reference [8] proposed a new technology and applied it to human behavior analysis and motion detection in video surveillance scenarios. Reference [9] provided a comprehensive overview of the current automated monitoring systems for abnormal behavior detection. Reference [10] proposed a method for estimating abnormal human behavior in different environments based on video surveillance. Reference [11] used a set of data codebooks constructed with two-level trees to calculate the similarity between space-time cubes at two scales and then used the linear discriminant analysis model to represent the topics of these scenes. If the test sample does not belong to these topics, it is considered abnormal. The hidden Markov model [12] is a typical inference model that can be applied to abnormal behavior detection in video scenes. Reference [13] used an independent hidden Markov model to construct sparse features in the training phase, and the model could adapt to scene changes by transforming into the most representative model. Reference [14] used a fully convolutional neural network (FCNN) for fast abnormal behavior detection. Reference [15] proposed the Appearance and Motion Deep Net (AMDN) and applied it to abnormal behavior detection in videos.

The above research is mainly based on the detection of abnormal human behavior in outdoor scenes. In campus scenes, students spend most of their time in indoor venues such as classrooms. The effect obtained when the above studies are applied to the detection of abnormal behavior by indoor personnel is not ideal. Moreover, although the detection method based on deep learning is convenient for feature extraction, its detection time cannot meet the needs of realistic applications. Abnormal behavior detection requires high efficiency in terms of detection time. In contrast, the time consumption of machine learning algorithms [1625] has improved. Based on this fact, this paper uses a concise detection framework based on machine learning algorithms. First, a background modeling method based on a Gaussian mixture is used to segment the background image of each image frame in a video. Second, block processing is performed on the image after dividing the background to obtain the space-time block image of each picture. Third, the foreground image features of each space-time block image are extracted. Fourth, FCM is used to detect the data set. The main work of this paper is summarized as follows:(1)An abnormal human body behavior detection framework based on machine learning algorithms is used. The current popular deep learning algorithms are also applied to the detection and tracking of abnormal pedestrian behavior. However, when the crowd density is high or the color and apparent texture are similar, the detection may fail. On the other hand, when the probability of running behavior by a pedestrian is small, a large number of training samples cannot be collected. In this situation, deep learning algorithms are not suitable, and the supervised machine learning algorithm used in this article is more applicable.(2)The detection framework takes the space-time block as the basic representation of a detection object and extracts the foreground motion features of the space-time block image. FCM is used to detect outliers in the data sample and finally realize the detection of abnormal behavior in the crowd.(3)To demonstrate the effectiveness of the detection framework used, this paper conducts experiments on public data sets and newly constructed data sets. The experimental results show that the detection effect of the proposed detection framework in indoor environments is significantly better than those of other comparison methods, and the detection effect in outdoor environments is similar to those of the comparison methods. In summary, for places with more indoor scenes, the detection framework used in this paper has better application value.

The organizational structure of the remaining part is as follows: the second section is related knowledge, which mainly includes Abnormal Behavior Characteristics, Human Abnormal Behavior Detection Process, and Typical Public Data Set. The third section is Abnormal Human Behavior Detection Framework, which mainly includes Inspection Framework Description, Representation of Detection Objects, Feature Extraction, FCM, and Detection Framework Execution Steps. The fourth section is Experimental Results and Analysis. The fifth section is the conclusion.

2.1. Abnormal Behavior Characteristics

In the process of human movement, normal behavior usually refers to the time and space states that exhibit a certain repetitiveness and regularity. These states include walking speed, walking posture, and spatial position. For example, walking and running at a constant speed are considered normal behaviors. However, there is no unified standard for the definition of abnormal behavior. Some scholars believe that all behaviors that do not match predefined normal behaviors are abnormal behaviors. Some scholars believe that behaviors that rarely occur or have short durations are abnormal behaviors. According to people’s daily behaviors and living habits, it has been found that people’s walking, running, and other behaviors have certain periodic laws. Therefore, for campus monitoring, we define the following characteristics as abnormal behaviors. These abnormal characteristics are as follows:(1)Abnormal walking trajectory: the walking motion trajectory of a person can indicate the purpose of his/her motion, and it is of great significance for safety monitoring systems to accurately detect this behavioral feature. For example, during a morning class, all students walk from the cafeteria or dormitory to the classroom, but some students walk backward. It is generally believed that people’s destinations should be clear. Therefore, the definition of an abnormal walking trajectory in this article is as follows: within the camera’s field of view, when the walking path of a moving target over a certain period of time appears as a loop, stop-and-go, and return pattern, it is said that the target exhibits abnormal behavior.(2)Abnormal walking posture: walking upright is one of the signs of normal human appearance. In daily behavior, most people walk upright, but they also have other behaviors. For example, when an event such as stomach pain occurs, a person bends over and walks slowly. Therefore, we define an abnormal walking posture as follows: when the tested target bends over and suddenly squats during walking, we say that the target’s behavior is abnormal.(3)Abnormal head rotation: according to people’s cognitive habits, when a normal person walks, he/she usually looks ahead. Therefore, this paper provides the following definition of abnormal head rotation: in the course of a person’s travel, when we detect frequent changes in the orientation of the person’s head, it is considered that his/her behavior is abnormal.

2.2. Human Abnormal Behavior Detection Process

The detection process for abnormal human behavior is shown in Figure 1. First, spatiotemporal segmentation is performed on the video to extract features that can describe the characteristics of the target area. Then, during the training phase, normal events are modeled. In the testing phase, the abnormality of the test features is calculated for the normal event model that has been learned. In addition, it is judged whether a given behavior is abnormal according to the set abnormality threshold. The two steps of feature extraction and abnormal behavior detection by the model have a great impact on the detection effect with regard to abnormal behavior.

(1)Pretreatment: during the preprocessing stage, the rough video data are prescreened through preprocessing operations. For example, normalization of video data and foreground extraction are performed to eliminate useless information in the image and enhance the detectability of valuable information. The data are simplified to the greatest extent possible, and their reliability is enhanced for subsequent processing and analysis.(2)Feature extraction: the features used to describe the characteristics of crowd movement are mainly divided into two categories: trajectory segmentation features and local temporal and spatial features. Most studies in the literature use local spatiotemporal features to extract other features, and the extracted motion information mainly includes gradient features and optical flow motion features.(3)Anomaly detection model: during the training process, the normal event modeling and testing phases make abnormality judgments on the test features based on the model. The detection model commonly used at this stage is the abnormal behavior detection model based on reconstruction. This type of method mainly uses normal behavior feature training to obtain a model for normal behavior. In the normal model, the reconstruction error corresponding to anomalies is large, and the reconstruction error corresponding to normal features is small. Therefore, a feature is judged based on a comparison between the test behavior characteristics and the reconstruction error of the normal behavior model with regard to the abnormal threshold. This type of method can be subdivided into two categories: reconstruction models based on sparse coding and reconstruction models based on deep learning.
2.3. Typical Public Data Set

Recently, the number of publicly available data sets for abnormal behavior detection has increased. The typical public data sets are introduced in Table 1.

3. Abnormal Human Behavior Detection Framework

3.1. Inspection Framework Description

The detection framework used in this paper is shown in Figure 2. As shown in the figure, the video data are preprocessed first. In this stage, the mixed Gaussian background modeling method [30] is mainly used to eliminate the background of each frame of a given image. In the second stage, the foreground image features of each space-time block image are extracted. In the third stage, FCM is used to cluster the extracted feature data sets. In the fourth stage, outliers are obtained according to the clustering results to determine whether the image frame is an abnormal frame.

3.2. Representation of Detection Objects

Taking the space-time block as the detection object, the video sequence frame is first divided into W ∗ H spatial blocks, and the size of each spatial block is L ∗ L. Because the focus is on analyzing the various parts of the movements of pedestrians, such as the impact of human legs running, the criterion for determining L is as follows: a single spatial block can be expressed as a certain part of a moving target, and a space-time block is composed of consecutive multiple frame spatial blocks at the same position. The basic representation method of using the space-time block as the detection object is not done to directly extract motion information from the current space-time block but rather to further analyze the motion effect of a space-time block with rich foreground information on the surrounding space-time block.

3.3. Feature Extraction

The description of the features of a space-time block can be obtained by using histogram statistics of low-level visual information such as the gray gradient, optical flow, and texture of the space-time block [31]. Pedestrians with the “panic running” behavior have the characteristics of fast movement and great kinetic energy. Their effect on the surrounding space environment is more significant than that of a pedestrian walking normally. The speed of pedestrian movement, the range of influence of pedestrian movement, and the distance between pedestrians and space are the main factors that determine the effects of pedestrians on space. The feature extraction process of running foreground images based on space-time blocks is shown in Figure 3.

As shown in Figure 3, first, the foreground motion image is extracted from the video frame sequence by the adaptive Gaussian mixture model, and the spatial block is obtained. The spatial block is combined with the foreground image and then preprocessed to obtain the foreground motion block. Then, the motion representation of the foreground motion block is obtained according to the dense optical flow of the video frame. Finally, the effect weights vector for all foreground motion blocks is calculated for each spatial block. The effect weights vectors of consecutive multiple frames of space blocks are averaged to obtain a feature description of a given space-time block. The detailed description of each step is as follows.

3.3.1. Foreground Motion Block

A foreground motion block can effectively represent the movement information of pedestrians. Set the j-th space block to . The foreground motion block refers to the space block where the premovement scenic spot appears. The foreground information contained in some spatial blocks in the foreground motion block may be information such as noise, which cannot correctly characterize the motion behavior of an object. To determine the existence of such motion blocks, this paper preprocesses all spatial blocks. Suppose is the number of front spots in a block; only when equation (1) is satisfied, can it be retained as a foreground motion block:

indicates that an image block can be used as the j-th foreground motion block when the above conditions are met. is the comparison threshold for the previous scenic spot. The above operation is the preprocessing of the foreground image, and foreground motion blocks can be extracted from the preprocessed foreground image.

3.3.2. Foreground Motion Effects Map

The optical flow vector of all pixels in a preprocessed spatial block is extracted, and the average value is used as the optical flow vector of the current block. This is used as the motion representation of the foreground motion block, as follows:

represents the optical flow vector of the i-th foreground motion block. J is the number of all pixels in the foreground block. represents the optical flow vector of the j-th pixel in the i-th foreground motion block. and represent the magnitude and direction, respectively, of the optical flow of the i-th foreground motion block.

3.3.3. Feature Extraction

According to the obtained subblock effect measurement, the subblock effect of normal behavior and the subblock effect of abnormal behavior can be effectively distinguished. Therefore, this paper uses a foreground motion effects map feature to characterize the motion effects of neighboring foreground blocks on space-time blocks.

For a spatial block and a foreground motion block , to measure whether the foreground motion block has an effect on the surrounding space block, two index variables are defined as follows:

represents the Euclidean distance between the foreground motion block and the space block . is the distance threshold. is the angle between the vector from the foreground block to the space block and the optical flow of . represents the field of view when the foreground motion block is in motion. These two index variables measure whether the space block is in the influence range of the foreground block .

The effect weight of the foreground motion block on the spatial block is defined as

When the space block is in the influence range of the foreground block , the weight of the effect of received by is inversely proportional to the distance between the two and directly proportional to the magnitude of the optical flow of . represents a pedestrian. When the pedestrian runs vigorously, the weight increases with the magnitude of the optical flow. Multiple foreground blocks with different motion directions have differently weighted weights on , so all effect weights of must be counted to form an effective feature representation. To increase the degree of discrimination of the extracted features for the purpose of calculating the efficiency of the model, the moving direction of the foreground motion block is quantized as follows:where and p is the total number of quantization direction intervals. represents the quantization direction index value of the optical flow of the i-th foreground motion block. The histogram statistics of the effect weights generated by the foreground block for the spatial block are based on the quantization direction of the optical flow of the foreground motion block:

Equations (4) and (5) are incorporated into equation (6) and the histogram statistics are calculated. To obtain the long-term statistics of the foreground motion effect and make the extracted features highly discriminative, the spatial blocks with m consecutive frames are taken as a space-time partition . According to equation (6), the characteristic description of each space block in can be calculated. The corresponding weight vector of a given space-time block can be expressed as . The feature description of the m-frame space block takes the mean value as the running foreground effect feature () of the space-time block . The calculation formula is as follows:

3.4. FCM

FCM is a classic clustering algorithm based on distance. Given a data set consisting of n p-dimensional samples, the data are grouped into categories. The center of each category is . is the cluster center matrix. is the membership matrix, which satisfies . The objective function of FCM is as follows:where m is the fuzzy coefficient, m = 2. Minimizing J can complete the division of the sample set. Using the Lagrange multiplier method to iteratively update J, the calculation formulas for the membership and clustering centers are obtained as follows:

Equations (9) and (10) are iterated repeatedly until the FCM algorithm converges. FCM is used to distinguish the outliers among the characteristic sample points of the space-time block to be detected. For the video frame sequence , the feature sample set of all space-time blocks located in , can be extracted. FCM is used to cluster the sample set located at (i, j), the optimal number of clusters for the optimization algorithm is K, and the best performance is obtained when the cluster center is set as .

For the image frame T to be detected, the space-time block response feature map located at (i, j) must be obtained. Through equation (11), the distance between it and the cluster center set can be calculated:

The threshold is set when , is an outlier. Then, the empty time block at is marked as abnormal. When the number of abnormal points in a frame of an image is greater than the total number of foreground blocks, the frame of the image is considered to be an abnormal frame.

3.5. Detection Framework Execution Steps

The execution steps of the detection framework used in this paper are as follows:(1)A video sequence is given. The feature set ,, is obtained by extracting the empty block at position .(2)The FCM algorithm is used to cluster the feature sample set to obtain the cluster center set .(3)The foreground motion image feature of the space-time block at in frame T to be detected is calculated. The distance from the cluster center set to is then calculated.(4)When , is regarded as an outlier.(5)The number of outliers in the frame is counted. If the number is greater than the total number of foreground blocks, the frame is determined to be an abnormal frame.

The flowchart of the detection framework used is shown in Figure 4.

4. Experimental Results and Analysis

4.1. Experiment-Related Instructions

To demonstrate the feasibility and superiority of the proposed detection model, the public UMN data set and manually generated data set are used in the experiment. The UMN data set is a public data set used for anomaly detection research that mainly includes sudden movements, crowd appearances and disappearances, and aggregations. The data set has a total of 7739 frames of images, each image size is 320 × 240, and the resolution is low. There are three types of scenes. The first type of scene contains 2 sets of complete anomalies. The second category contains 6 complete abnormal data points. The third category contains 3 sets of complete abnormal data points. The comparison methods are taken from [32], [33], and [34]. The settings of the parameters for each comparison method are consistent with those in the respective references. The evaluation indicators are the detection accuracy [35] and Area under the ROC curve (AUC) [36].

4.2. UMN Data Set Experiment

The first 180 frames of 9 video clips in three scenes are extracted as the training set, and the rest are extracted as the test set. The detection accuracy and AUC values of each method under different scenarios are shown in Table 2.

The experimental data in Table 2 show that different detection methods work in different scenarios, and the performance of each is quite different. The common point is that all algorithms have their best detection performances on Scene 1. From the average results of the three scenes detected by each method, it can be inferred that the detection accuracy and AUC obtained by the method in this paper are both greater than those of other comparison methods. The detection accuracy obtained by the method in this paper is improved by 5.2% over that of SF, 1.3% over that of STC, and 2.0% over that of MA. The AUC is increased by 2.0% compared with that of SF, 1.9% compared with that of STC, and 1.5% compared with that of MA. Through the comparison with the two above indicators, it is demonstrated that the method in this paper performs better than other methods in detecting abnormal human behavior on the UMN data set.

To explore whether the selection of the clustering algorithm in the detection framework is appropriate, we replace the FCM at the end of the framework with the classic K-means clustering algorithm [37]. The impacts of these two different clustering algorithms on the detection performance of this detection framework are compared. Two different clustering algorithms are applied to the detection framework to detect human abnormalities in the UMN data set. The experimental results are shown in Table 3.

The experimental results show that the detection effect obtained by using FCM in the detection framework is better than that obtained by using K-means. This is because FCM is not sensitive to noise in the data, so the detection result is not easily affected by noise either, thereby improving the detection accuracy. This is the reason why this paper chooses FCM for use in the detection framework.

4.3. Self-Made Data Set Experiment

To further verify the robustness of the algorithm in this paper, we download multiple normal and abnormal group behavior videos under different scenarios from the Internet. The downloaded videos are randomly combined to construct 35 video data points. Each video includes anomalous group incidents such as indoor pushing, trampling, outdoor robberies, and terrorist attacks. Among the 35 videos, 25 videos are used as training data sets, and 10 videos are used as test data sets. The experimental results on the manually created data set are shown in Table 4.

In indoor scenes, the detection method used on the manually created data set has the best detection accuracy and AUC value. This fully demonstrates the superiority of the proposed detection method when used for indoor scene detection. In outdoor scenes, the MA algorithm has the best detection performance, but the detection effect of the proposed method is close behind, and the performance gap is not large. Moreover, the detection performance of this method in outdoor scenes is better than those of SF and STC. This shows that the indoor detection performance of this method is significantly improved, the outdoor detection performance is not poor, and the method can fully meet the needs for actual application scenarios.

5. Conclusion

Abnormal behavior detection in videos is a research hotspot in the field of smart security. According to different surveillance video scenes and surveillance objects, abnormal behaviors have different definitions. The detection of abnormal behaviors among a crowd in public places has high research value. This research focuses on the analysis and detection of the abnormal behaviors of crowds in videos, focusing on “panic running” and other behaviors. For these abnormal behaviors, a detection framework based on machine learning algorithms is used. First, a space-time block is used as the basic representation of the detection object. Second, a background modeling algorithm is used to analyze the effect of the foreground motion block on the surrounding space blocks and calculate the characteristics of the foreground motion effects map of the space-time block. Finally, the FCM algorithm is used for clustering, training, and outlier detection. When the number of detected outliers is greater than the number of foreground blocks, it is determined that the detected frame is an abnormal frame. Compared with existing detection algorithms, the experimental results show that the method used has better detection performance. However, there are many parameters that need to be manually determined in the proposed detection framework, and this is a drawback of this research. Improving upon this drawback is also a future research direction.

Data Availability

The labeled data sets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This study was supported by the Scientific Research Project of Jilin Education Department.