Abstract

Effective abnormal human behavior analysis serves as a warning signal before emergencies. However, most abnormal human behavior detections rely on manual monitoring at present. This method is criticized for being subjective and lack of timeliness. In response to the problems above, this paper proposes a multistage analysis method of abnormal human behavior in complex scenes. This paper firstly differentiates the abnormal behavior roughly from a large monitoring area with similarity measurement applied to the social force model, and precise analysis is conducted thereafter. The multistage analysis, based on the three-frame difference algorithm, is used for intrusion, left-behind baggage detection, and motion trajectory identification. The experimental result demonstrates the superiority of the proposed method in UMV, CAVIAR, and datasets. To demonstrate the adaptability and generalization ability of the proposed method, this paper selects the CVC and JAAD driving anomaly detection data sets to test the method. Experimental results show that the proposed method is superior to the existing methods.

1. Introduction

Behavior analysis, first introduced by psychology, utilizes psychological knowledge to describe, analyze, predict, and control an individual’s external manifestations. The way human beings control his action can be divided into two levels: self-control and social control. The goal of self-control is adapting to the scene by reducing behavioral deficiencies or behavioral excesses. The so-called abnormal human behavior means that the behavior of an individual deviates from social norms, rules, or conventions. Analysis of abnormal behavior, which focuses on the causes, forms, and potential hazards, can contribute to the precaution of unusual events and loss mitigation.

Video security monitoring system uses video as a means of detection to monitor, control, and record information. When an unusual event occurs, the video surveillance system will call the image related to the alarm region to observe and reconfirm the condition on the spot. The current video surveillance system combines the latest research achievement of computer vision, sensor networks, and so on. It could be used in various scenarios, including government, transportation hubs, hospitals, and schools. However, the current video surveillance system rarely has a prediction function and relies on personal judgment. Human emotions are complicated, so video surveillance is prone to subjectivity and lag of decision-making.

A single control model cannot meet current social needs. Multistage decision-making refers to the process in which problems can be decomposed into several interrelated stages chronically, and at each stage, decisions must be made. Therefore, the whole process is a decision sequence. The multistage model has strong reliability, robustness, and a wider range of applications.

To solve the above problems, this paper proposes a multistage analysis method of abnormal human behavior in complex scenes. Our approach works in three steps. Firstly, we make a rough analysis of the monitoring area by using the social force similarity measurement method. Through the interaction between social force and individual desire force, the text differentiates the normal and abnormal behavior in the monitoring area. Secondly, we propose a multimodel human behavior analysis method based on the three-frame difference algorithm. At last, intrusion, left-behind baggage detection, and motion trajectory identification in the unusual region are detected specifically. The main contributions of this paper are as follows: (1)The similarity measure model is introduced into the social force model, reducing the amount of data calculation in behavior detection(2)A multimodel human behavior detection approach is established to improve the robustness of the analysis(3)A large number of experiments has been carried out to evaluate the performance of this approach. The result shows that our approach is superior to the existing methods

The rest of this paper is arranged as follows. The second part provides an overview of related work. The third part builds a social force similarity measurement model for human behavior region detection. The fourth part conducts a multimodel abnormal human behavior analysis. The fifth part analyzes the experimental results. Finally, the sixth part summarizes the paper.

The correct judgment of abnormal human behavior analysis can serve as an effective warning signal before emergencies and hence minimize the damage caused by unexpected events. Many scholars have made outstanding contributions to the analysis of abnormal human behaviors in complex scenes. Lan et al. [1] proposed a multifeature sparse representation framework for visual tracking, in which multiple sparse models were decomposed to distinguish different features. Most CNN-based trackers use hierarchical features to represent targets. However, the layered feature is not completely effective in separating the target from the background, especially in complex interference environments such as heavy occlusion and illumination variation. To solve the above problems, Qi et al. [2] proposed a CNN-based tracking algorithm which hedges deep features from different CNN layers to better distinguish target objects and background clutters. Many experimental results show that the target-tracking method is effective. For autonomous driving application, traditional tracking methods usually distinguish between a target and a background by training a support vector machine (SVM). When the training sample is nonlinear, the tracker will not be able to track the target. To address this problem, Zhang et al. [3] proposed a distance-based distance learning method based on deep learning. Experiments have shown that even without model updates, the proposed method achieves favorable performance on challenging images. Zhang et al. [4] proposed a research scheme using density-based clustering and backtracking strategy to detect small dim targets. The algorithm extracted the trace with the highest clutter suppression ratio, increasing the accuracy of target detection. Rotation invariance was introduced into aerial images of urban scenes by ElMikaty and Stathaki [5]. A feature detection method which analyzes the principle of wavelet transform applied for signal filtering was proposed in order to obtain high detection performance for small targets of complex multimedia images [6]. Asif et al. [7] presented an object segmentation approach for indoor scenes using perceptual grouping algorithm.

When it comes to establishing a multistage model, Grey relational analysis method and an objective optimization model based on maximum entropy are utilized to make the decision-making more comprehensive [8]. Tammi et al. [9] proposed that using the clustering techniques with the amalgamation of the neural network had higher accuracy in detecting the attacks. To address image forgery problem, an algorithm [10] was designed to classify the image blocks based on a feature present in multicompressed JPEG images. Huang et al. [11] presented a bottom-up-based framework for salient object detection without any prior knowledge. The approach was more effective in highlighting the salient object and robust to background noise.

In terms of human behavior analysis, Iosifidis et al. [12] demonstrated a novel nonlinear subspace learning technique for class-specific data representation, which was obtained by applying nonlinear class-specific data projection to a discriminant feature space. Under the guidance of kinematics, Xiao et al. [13] proposed a data-driven approach to identify typical head motion patterns and stressed on analyzing human behavioral characteristics via signal processing methods. Human dynamics provides new ways of researching human behavior by exploiting the statistical analysis method. The interval time series and the number series of individuals’ operating behavior was investigated by a modified multiscale entropy algorithm, which provided insights for further understanding of individual behavior at different time scales [14].

With respect to abnormal human behavior detection, pedestrian behavior modeling and analysis in video surveillance is important for crowd scene understanding. Yi et al. [15] proved the reliability in judging normal and abnormal pedestrian behavior by simulating pedestrian behavior and established two large pedestrian walking route datasets for future research. Hu et al. [16] proposed to detect abnormal driving, which may incur tragic consequence for individuals and the public, by analyzing normalized driving behavior. The driver’s anomaly detection is very important in the field of automatic driving, which can prompt the driver to inform the danger. The detection of the driver is extremely difficult due to camera shake, a sharp change in the speed of the vehicle, and the like. In response to the above problems, Yuan et al. [17] proposed a spatial local constrained sparse coding method for anomaly detection of traffic scenes. From the experimental results, the proposed method is more efficient than the popular competitors and produces higher performance. By employing 3D discrete cosine transform of the target in different frames, Yuan et al. [18] designed a multiobject tracker, which makes tracking analysis possible in high-density crowds. The experimental results of several public crowd video datasets verify the effectiveness of the proposed method. A taxonomy of novel classes of neighbor-based trajectory outlier definitions was designed to detect abnormal moving objects. The experimental result showed that the method worked well for high-volume trajectory streams in near real time [19].

To sum up, scholars have proposed many efficient solutions in the analysis of abnormal human behavior. However, there are still some problems waiting to be resolved: (1) manual monitoring is prone to subjectivity. (2) There is time lag in the discrimination of abnormal human behavior.

3. Human Behavior Region Detection—a Rough Analysis

In human behavior detection, human group behaviors are difficult to describe. There is no precise definition of abnormal human behavior. The so-called abnormal behavior is judged by the deviations from group behavior characteristics. In large-scale video surveillance, occlusion, low resolution, and interaction between human beings make it difficult to detect abnormal human behavior. LDA model [20, 21] can depend on prior category knowledge and experience in dimension reduction, distinguish normal and abnormal frames in the video by using threshold value. Unfortunately, the result will produce overfitting data. Therefore, this paper proposes an abnormal human behavior detection method based on social force similarity measurement model, which determines whether a person’s behavior is abnormal according to the consistency with the surrounding scene. LDA-H adds similarity measure model Hash [22] to LDA to solve the problem of overfitting. Only two kinds of data—normal and abnormal behavior, are necessary for analyzing human behavior. LDA-H projects data into low dimensions, then uses a similarity measurement model to make the projection points of each data category as close as possible, and finally determined the normal and abnormal human behaviors.

First, each image in the UMN dataset is preprocessed, with the image binarized and size adjusted to . Secondly, feature extraction is carried out on to obtain the feature vector , where represents a feature point vector in the graph. The compression of the feature matrix is achieved by summing the multiperspective feature matrices.

This paper uses the social force model [23] to quantify H. In a certain scene, people usually do things with a specific purpose. Thus, the algorithm must take into account the action potential field model of every person presents in the scene. If a pedestrian’s normal speed is and his expected speed is , the model of personal desire force in the scene is defined as follows: where is the similarity threshold in the similarity metric model. The interaction force between individual behavior and the scene could be divided into individual desire to force and social influence . Therefore, this paper defines the interaction force as follows:

The influence of on the similarity measurement of social forces will result in an opposite disturbance, i.e., the interaction force will fluctuate and the abnormal behavior will happen. Thus, the individual abnormal behavior model is derived as where is the constant parameter of the social force model in case of anomaly. is a normal pedestrian’s trend scalar. A similarity measurement model is associated with the social force model to measure the interaction force similarity of all human beings in the dataset.

The particle motion model of each pedestrian is defined as , and the individual expected speed of particle motion is

Finally, based on a given scenario, the interaction force of each person in a group is estimated to be

In this paper, the image information is mapped to 0 or 1 according to the characteristic similarity threshold of human behavior : 1 and 0 stand for ’s normal behavior and abnormal behavior, respectively. Finally, the hash value is obtained.

represents the motion threshold of similar particles of all objects. When human behavior characteristics are greater than or equal to , the model determines that the characteristics are similar and outputs the normal behavior. In contrast, when the human behavior characteristics are less than , the model determines that the characteristics are different and outputs abnormal behaviors, as shown in Figure 1.

3.1. Abnormal Human Behavior Detection—a Precise Analysis

In the detection of abnormal human behavior, a wide range of rough detection is way from enough. A precise analysis is needed to make an effective judgment on abnormal human behavior. The target, which is marked out by red in the previous step, will be tracked and monitored subsequently. In the process of target tracking, we put forward a multimodel based on the three-frame difference method to detect abnormal human behavior accurately.

3.2. Intrusion Detection

Intrusion refers to entering a designated area by a mobile target such as a person or an object without permission. In the monitoring area, the motion of the target is with a strong purpose. As a result, there are obvious changes between consecutive frames, and the position of the target is different in each frame. Interframe difference method is the difference between two or three consecutive frames of images in time. Pixel points corresponding to different frames are subtracted to derive the absolute value of gray difference. When the absolute value exceeds a certain threshold, it can be determined as a moving target, realizing the detection goal. The two-frame difference method is suitable for small and slow targets. When the target moves rapidly, the position difference between the two frames of the target image is too big, image ghosting might very well happen. Therefore, this paper applies the three-frame difference method to abnormal human behavior analysis. Let the images of frame , frame , and frame in the video sequence be , , and , respectively. The corresponding gray value of pixel points should be , , and . Then, subtract the gray value to obtain the difference in image and . In this paper, we optimize the three-frame difference method, that is, adding the difference between the third frame and the first frame:

Threshold processing and connectivity analysis are carried out on the difference image and to obtain the extracted target, as shown in Figure 2.

3.3. Left-Behind Baggage Detection

Left-behind baggage refers to the situation that items such as bags or backpacks are left unattended in the monitoring area for more than a certain period. In order to detect the left-behind items more accurately, the Sobel operator is introduced into the three-frame difference method to detect the edges of the target. The purpose of this method is to determine the presence of left-behind items quickly and effectively when the item is left unattended. The Sobel operator monitors the horizontal edge in this paper. Compared with similar algorithms, the Sobel operator weights the pixel threshold, reducing the degree of edge blur effectively. Combined with the three-frame difference method, set the moving target as and the left-behind object model as

In order to improve the reliability of the legacy warning, this paper adds the conditions for the disappearance of pedestrians to the remnant model. This method will provide an early warning when a pedestrian leaves the surveillance area. Therefore, the legacy warning model is updated to

When the people and the target are separated, the pedestrian exists in the monitoring area and the model determines that it is a normal behavior. When the people and the target are separated, the pedestrian leaves the monitored area, and the model determines that it is an abnormal behavior and promptly makes an early warning, as shown in Figure 3.

3.4. Motion Trajectory Identification

With the improvement of video analysis, single or local image information cannot meet the precise analysis requirements. According to the characteristics that pixels in the space-time region are spatially smooth, this paper introduces the spatial context model [24] into the three-frame difference algorithm and proposes a new trajectory analysis model.

According to the three-frame difference method, is the target of monitoring. The confidence graph of each frame in the video is obtained by the STC algorithm.

Based on the spatial context, the conditional probability is the spatial relationship between the target and the residual scene. We define spatial context information as where is the remaining area excluding the target and is the image feature of the target at the point . The maximum confidence position of the confidence graph in the spatial region is the target position of the frame.

STC is introduced into the three-frame difference method. In the process of monitoring the target, the target position is tracked and predicted by the maximum confidence graph within the region, and the motion trajectory of the target is marked, as shown in Figure 4.

Based on the three-frame difference method, with the amalgamation of Sobel operator and spatial context relationship, a multimodel analysis method for abnormal human behavior is established. This method can accurately analyze and determine abnormal human behaviors such as intrusion, left-behind baggage, motion trajectory, and make alerts in time.

4. Simulation Experiment

To verify the validity of this method, we select UMN, CAVIAR, and other datasets to test the algorithm.

Hardware configuration: in this paper, we use MATLAB under the Windows 10 system for experimental simulation. The simulation calculation runs on a small server with a CPU of E5-2630 v4, a main frequency of 2.2 GHz, and a memory of 32 GB.

Evaluation index: in order to verify the effectiveness of the method, both qualitative and quantitative criteria are utilized. In the qualitative assessment, we show the anomalous behavioral calibration of the dataset. In the quantitative assessment, we adopted the pixel-wise receiver of characteristics (ROC) and area under ROC (AUC) are employed. ROC representative of detection capabilities, while reflecting the relationship between the TPR and FPR. Among them, TPR is the true positive rate, , where is the true positive and is the positive pixel number. FPR is false-positive rate, , where is the false positive and is the negative pixel number.

When conducting intrusion detection in the monitoring area, we first mark the control area in red color. If there is any abnormality, the abnormal pedestrians will be marked with green boxes, as shown in Figure 5. The monitoring objects of Video 1 and Video 3 are corridors. When students are in class or dismissed after school, there should be no person in the red area. If a target intrudes into the red area, the proposed method determines that this action is an illegal intrusion. The monitoring objects of Video 2 and Video 4 is public areas. Lawns, rooftops, and public areas are marked as red forbidden areas. If a target breaks into the red area, the proposed method will mark it out in time and give warnings. Video 5 belongs to the USCD dataset. In order to ensure the safety of the walking crowd, we set the other motion status indicators as abnormal. It can be seen from the simulation results that the algorithm effectively identifies the foreign object intrusion behavior in the video, such as bicycles, scooters, and minivans.

According to the experimental results, the proposed method is not only applicable to the indoor scene but also to the outdoor scene. When the target enters the monitoring region, our method can identify the target quickly and effectively.

Table 1 shows the AUC comparison of different algorithms under different datasets.

In the UMN data set, the method proposed in literature [17] shows the best results. In the CAVIAR and USCD datasets, ours demonstrates the best results. Among the average of the three data sets, the method proposed in literature [17] shows excellent results, and ours is only 0.0003 different from the method.

When conducting a left-behind baggage experiment, we select surveillance data from parks, streets, and bus stop. This paper presents a method combining edge detection and the three-frame difference method, which can not only detect the state of separation between the owner and objects quickly but also determine the left-behind item accurately through the connection between each video frame. The detected item will be denoted with a red dot, as shown in Figure 6. When the pedestrian leaves the monitoring screen, the algorithm judges it as an abnormal behavior and issues a warning in time.

No matter where a person is going, his behavior must have a certain purpose. Therefore, the movement trajectory is regular, i.e., when people do something with a purpose, the movement trajectory tend to be a straight line. In contrast, aimless behavior will lead to a wandering state of the trajectory, as shown in Figure 7. The outputs of Videos 1, 2, and 3 are marked abnormal because the motion trajectories present a wandering state, as opposed to trajectories in Video 4 and Video 5 which are close to straight lines, and marked normal.

In order to verify the generalization ability of the algorithm, we also selected the CVC-08 dataset and the JAAD data set for driving anomaly detection. The video dataset is captured by a car camera and the viewing angle is basically the same as the driver’s daily driving. At the same time, in order to prove the adaptability of the algorithm to different targets such as people and cars, we keep the resolution of each frame of the video consistent with the above.

The results of the car-driving anomaly detection are shown in Figure 8. The CVC-08 dataset shows the pedestrian detection of the car during driving. The experimental results show that the algorithm can effectively identify when the pedestrian enters the zebra crossing and leaves the zebra crossing. The JAAD-01 video dataset is for bicycles crossing the road. The experimental results show that the algorithm can recognize the target when it is far away from surveillance. The JAAD-03 video dataset is a car replacement lane. When the car is changed from the rightmost lane to the left lane, the algorithm can effectively identify the vehicle under multiple targets. The JAAD-04 video dataset is for pedestrians crossing the road at night. The dataset demonstrates that the algorithm can still effectively identify the target in the absence of illumination.

Table 2 shows the AUC comparison of driving anomaly detection in CVC and JAAD. Both CVC-08 and JAAD-04 detect pedestrians crossing the road. In the normal weather conditions of CVC-08, the literature [16] proposed the method works best. In the case of poor JAAD-04 light, the method of this paper showed the best results. In the JAAD-02 video dataset, the method has the best recognition effect on bicycles. However, in the JAAD-04 video dataset, the method of this paper is not as good as the method proposed by literature [16]. Finally, in the JAAD-03 video dataset, this method demonstrates the applicability to multitarget.

5. Conclusions

This paper proposes a multistage analysis of abnormal human behavior in complex scenes to optimize the previous method, which relies on manual monitoring and is subject to time lag. Firstly, we differentiate the normal and abnormal areas of human behavior roughly in video surveillance according to the social force similarity measurement model. Secondly, based on the three-frame difference method, a multimodel which is suitable for intrusion and left-behind baggage detection, motion trajectory identification is established to do a precise analysis of abnormal human behavior. The experimental results show that the proposed method has good results in human abnormal behavior analysis and driving anomaly detection, which fully demonstrates the adaptability and generalization ability of the proposed method.

Data Availability

The dataset contains confidential information such as the performance parameters and tactical technical indicators of abnormal human behavior. Therefore, the dataset of this paper has certain confidentiality and cannot be released.

Conflicts of Interest

On behalf of my coauthors, the authors declare that there is no conflict of interests regarding the publication of this article.

Acknowledgments

This paper is supported by the National Natural Science Foundation of China (61703143), Science and Technology Department of Henan Province (192102310260), Scientific and Technological Innovation Talents in Xinxiang (CXRC17004), the young backbone teacher training project of Henan University (2017GGJS123), and the Science and Technology Major Special Project of Xinxiang City (ZD18006).