Abstract

Human multipose motion behavior is similar; there are many actions. However, it is difficult to recognize abnormal behavior. The existing human motion behavior anomaly recognition methods have the problems of low accuracy and being time-consuming. Therefore, an anomaly recognition method of human multipose motion behavior using Generative Adversarial Network (GAN) is proposed. The Gauss model is used to segment the human multipose motion behavior image, and the image foreground of the segmentation result is the human motion target detection result. The Shi-Tomasi algorithm is selected to extract contour feature points from human motion object detection results. The extracted contour features are set as hidden space random variables and input into the GAN. The GAN uses the generator and discriminator to recognize the multipose human motion behavior and determine whether the multipose human motion behavior is abnormal. The results show that the proposed algorithm can accurately recognize abnormal human multipose motion behavior, the recognition accuracy is higher than 99%, and the average recognition time is less than 200 ms. The shadow removal effect of the foreground image obtained by the proposed algorithm can realize the accurate recognition of human multipose motion behavior abnormalities and provide a reliable basis for research in related fields.

1. Introduction

Human posture behavior recognition is a research in the field of video analysis. The goal of human posture behavior recognition is to classify the specific actions of humans [1]. Human abnormal behavior detection is a branch of human behavior recognition, that is, to recognize specific behaviors in different scenes, such as fighting in banks and other literature behaviors. Human posture behavior technology has been widely used in auxiliary medical treatment, video surveillance, and so on. The recognition of multipose motion behavior abnormalities needs to determine whether there are abnormalities by analyzing human posture [2].

The specific position and specific behavior of the motion target in a fixed time are analyzed through the abnormal recognition of multipose motion behavior [3, 4], so as to realize the automatic recognition and positioning of human posture. The abnormal recognition of human multipose motion behavior can be applied to many fields such as medical treatment and security. Human motion posture has the features of high nonlinearity and degree of freedom [5, 6] and has high diversity and complexity. It is challenging to accurately recognize human multipose motion behavior abnormalities. The feature extraction of human motion posture is very important [7]. The accuracy of feature extraction determines the accuracy of multipose motion behavior. Feature extraction is realized by extracting texture features, color features, contour features, and other features of human motion posture [8, 9]. Human motion is formed by posture sequence, so it is more practical to extract human contour features from images.

The main contributions of this paper are as follows. (1) A new human multipose motion behavior anomaly recognition algorithm based on GAN is proposed to detect human motion targets in human multipose motion images. (2) Aiming at the problem of many similar actions of human multipose motion behavior, the human motion region and background region in the image are segmented, and the human motion posture features are extracted from the human motion target region to reduce the difficulty of behavior anomaly recognition. (3) The GAN is used to obtain the abnormal recognition results of human multipose motion behavior. The network has the features of flexible application and strong learning ability. It is applied to the abnormal recognition of human multipose motion behavior to optimize the recognition performance.

At present, there are many researches on motion recognition. Literature [10] applies convolutional neural network (CNN) to human motion recognition and proposes three different CNN structures. Firstly, four different information channels are generated from each frame in the horizontal and vertical directions through optical flow and gradient to be applied to three-dimensional (3D) CNN. Then, three architectures are proposed, namely, single stream, dual stream, and four stream 3D CNN. In the single stream model, four information channels per frame are applied to the single stream. In the dual flow structure, optical flow- and optical flow- are applied to one flow, and gradient- and gradient- are applied to another flow. In the four stream architecture, each information channel is applied to four independent streams, by evaluating the architecture of the action recognition system.

Literature [11] applied shape time dynamics to human motion recognition and proposed a human motion recognition framework with constant depth perspective, which is a new integration of two important motion clues: motion and shape time dynamics (STD). The motion flow encapsulates the motion content of the action into RGB dynamic image, which is generated by approximate rank pool (ARP) and processed by fine-tuning reception V3 model. The STD stream uses a series of long short-term memory (LSTM) and Bi LSTM learning models to learn the long-term view invariant shape dynamics of actions. Human pose model (HPM) generates view invariant features of human pose frames with key depth based on structural similarity index matrix (SSIM). According to three types of postfusion techniques, namely, maximum average and multiplication, the final prediction of single stream fraction is made. Literature [12] proposed an effective method for human motion recognition (HAR) from silhouette image sequences in video. The effectiveness of this method lies in feature extraction and action classification. The method includes scale translation normalization and distortion contour removal, which are used to extract the newly introduced spatiotemporal features, namely, active region energy feature (AREF), and trajectory analysis. In addition, the method uses low-dimensional eigenvectors, which makes the cost of the whole process effective in terms of time requirements. The results on the published Weizmann and Muhavi data sets clearly verify the efficiency of the proposed technology in the related work on the accuracy of human behavior detection. Literature [13] proposed a human behavior anomaly detection method based on the combination of deep learning and artificial features. Firstly, the key points of human 3D skeleton in time series are extracted by Yolo V4, and the mean shift target tracking algorithm is applied. Then, the key points are transformed into spatial RGB and put into multilayer convolutional neural network for recognition, abnormal behaviors such as hitting, throwing, climbing, and approaching. Literature [14] uses deep learning technology, including CNN and LSTM network, to build a deep network in a multiperspective framework to learn the long-term correlation of human behavior recognition from video. Two cameras are used as sensors to effectively overcome the problems of occlusion and fuzzy contour and improve the accuracy and performance of multiview frame. After a series of image preprocessing on the original data, the human contour image is obtained as the input of the training model. Literature [15] combines spatiotemporal CNN with handmade feature sets for anomaly detection in continuous video frames. Handmade features learn sparse features extracted from moving human image units, including moving pixels to reduce computing costs. The CNN model architecture is used to extract spatiotemporal features and complete the recognition of human abnormal behavior. However, human multipose motion behavior is similar, and there are many actions. This leads to the question of low accuracy and being time-consuming in the application of the above existing methods.

3. Methodology

3.1. Detecting Human Moving Targets

The multipose motion image of human contains background factors such as shadow and illumination. Therefore, it is difficult to obtain human target from the image. The human motion target detection process for segmenting human motion target and image background is as follows.

3.1.1. Establish Gaussian Model

Let be the pixel color of the random point in the human multipose motion image, and the probability density function expression of the pixel color is established as follows: where and are the mean and weight of Gaussian distribution when the time is , respectively. and are covariance matrix and Gaussian distribution, respectively. According to equation (1), the utilization number of pixel color probability density function is . When the time is , the expression of the th Gaussian distribution is as follows: where represents the dimension of pixel color , .

3.1.2. Update Model

Let the random pixel value in the image be . The expression that the Gaussian function matches the pixel value is as follows:

The expression of each parameter update is as follows: where and represent adjustable learning rate and parameter learning rate, respectively.

3.1.3. Segmented Image

Normalize all the weights obtained, and sort all Gaussian distributions according to the order from large to small. The first distributions to be sorted must meet the following conditions:

The first distributions are the background distribution of human multipose motion image, is the weight threshold.

Randomly select pixels, which are foreground pixels when they can meet the established Gaussian mixture model [16]. When the pixel cannot meet the established Gaussian mixture model, this pixel is the background pixel [17]. Through the above process, the foreground and background of human multipose motion image are segmented, and the motion targets in human multipose motion image are detected.

3.2. Extracting Contour Feature Points

After segmenting the foreground and background of human multipose motion image, it is necessary to extract the contour feature points in the foreground of human motion image, so as to provide the basis for human multipose motion behavior anomaly recognition based on GAN. Harris corner detection is the most widely used image contour feature extraction method. A corner is the intersection of two edges. However, Harris corner detection needs to calculate the empirical constant , and the operation process has high complexity. Therefore, the Shi-Tomasi algorithm is proposed. Since the stability of the Harris corner detection algorithm is related to the value of the empirical constant and the empirical constant is an empirical value, it is difficult to set the optimal value. Shi-Tomasi found that the stability of corner points is actually related to the smaller eigenvalue of the matrix, so the smaller eigenvalue is directly used as the score, so there is no need to adjust the value. Therefore, the application of Shi-Tomasi is simpler, and the image contour feature points obtained by Shi-Tomasi are also very accurate.

The Shi-Tomasi algorithm is selected to extract contour feature points in human motion image. and are the random point coordinates of the multipose motion grayscale image of the human and the gray value of the point, respectively. The point as the center point is using window is established. The window size is , window is translated, and the gray value changes as follows after translation:

Taylor expansion process , delete the terms higher than the second order, and the expression is as follows: where and are the partial derivatives of human multipose motion images and is the Gaussian filter.

Equation (7) can be transformed into the following matrix:

The expression in equation (8) is as follows:

The Shi-Tomasi algorithm extracts contour feature points from human multipose motion images. The extraction rules are as follows: (1)Randomly select the pixel point, let the pixel point be the midpoint, establish a window with size , and move the window in different directions [18](2)Compare the changes of gray value in human multipose motion images. When the window is moved in different directions, when the gray value in the image is fixed, it means that the area is flat [19], and there is no feature point; when moving along a fixed direction and there is only a small change in the gray value, it means that the area is a straight line area(3)When the established window moves in a random direction [20], when the gray level in the multipose motion image of the human changes greatly, the center point of the established window is the feature point of the human motion image, and this feature point can be used to realize the anomaly recognition of human multipose motion behavior

3.3. Anomaly Recognition of GAN

The discriminator in the GAN is used for the final recognition of human multipose motion behavior. The discriminator has the function of classification [21] and can judge whether the input sample data belongs to the generated sample data or real data. GAN is an important GAN algorithm with high linear correlation performance. When the GAN recognizes the multipose movement behavior of human, it can quickly obtain the recognition results [22]. There is no need to interfere with the learning process of the GAN, and accurate recognition results can be obtained only by evaluating the model effect after learning. The structure of GAN is shown in Figure 1.

and represent the sample data and random noise of the input GAN, respectively, and and represent the generator and discriminator of the GAN, respectively. The expression of the GAN is as follows: where and are generated data and discriminator judgment, respectively, and and represent objective function and discriminator function, respectively.

The discriminator and the generated data play games with each other by using the maximum and minimum values, and the optimization of parameter and parameter is completed through multiple iterations until the discriminator and the generated data are in Nash equilibrium.

The GAN is prone to gradient dispersion in the established minimization objective function. Therefore, the objective function cannot update the generator in the GAN [23], which will reduce the stability of the GAN. LSGAN method is selected to solve the training instability of GAN. LSGANs ensure the classification accuracy of GAN through cross-entropy. The objective function expression of GAN discriminator optimized by LSGANs is as follows:

The objective function expression of GAN generator is as follows:

After completing the antagonistic generation network training, the adjusting parameters , , and need to meet the following conditions:

Through the above process, the uncertainty caused by the training process of GAN can be improved [24], the diversity of GAN is effectively improved, and the recognition accuracy of human multipose motion behavior abnormalities is improved.

The GAN is applied to the abnormal recognition of human multipose motion behavior. The process of abnormal recognition is shown in Figure 2.

According to Figure 2, firstly, the multipose human image with abnormal behavior to be recognized is segmented. After the image segmentation is completed, the detection results of human motion targets are obtained, the contour features of human motion behavior are extracted from the extracted human motion targets, and the extracted features are set as hidden space random variables and input into the GAN. The GAN outputs the abnormal behavior recognition results of multipose human motion behavior to determine whether the multipose human motion behavior is abnormal.

3.4. Data Sets and Evaluating Index

In order to verify the effectiveness of the proposed algorithm in recognizing abnormal human multipose motion behavior by GAN, the Occlusion_person 3D data set and CMU panoptic data set are selected as the test data set. Occlusion_person 3D data set has 4.8 million 3D human postures and corresponding images, a total of 200 experimenters, and a total of 23 action scenes. CMU panoptic data set is produced by CMU University and collected by 480 VGA cameras, 30+ HD cameras, and 10 kinect sensors. The above two data sets are typical human posture data sets. The two data sets include walking, running, kicking, jumping, standing, squatting, hands up, reverse, head down, head up, and other postures, which are extremely typical. Forty thousand images were randomly selected from the two data sets. In this experiment, twenty thousand images are selected for data training, and the remaining half are used for experimental test and analysis. The number of images of each behavior posture is shown in Table 1. In the simulation platform, the abnormal behavior of the image is recognized.

The accuracy, precision, and recall are selected as the evaluation indexes to evaluate recognition performance of the proposed algorithm. The calculation expressions of three indexes are as follows: where and represent the number of abnormal behaviors recognized by the algorithm as abnormal behaviors and nonabnormal behaviors, respectively. and are the numbers of abnormal motion behaviors and nonabnormal behaviors, respectively. The accuracy and precision are used to measure the recognition level of the algorithm and the level that can avoid misrecognition, and the recall rate reflects the level at which the algorithm can correctly classify whether the behavior is abnormal sports behavior.

4. Results and Discussion

The proposed algorithm is used to recognize multipose human motion behavior, and the abnormal behavior recognition results of the proposed algorithm are compared with actual abnormal behavior results, as shown in Table 2.

According to Table 2, for the 10 motion behaviors, the total number of images is between 168 and 667, and the number of abnormal images is between 19 and 63. The proposed method is used to recognize these images, and the results are close to the actual number of abnormal images. The recognition rate of walking, jumping, standing, squatting, and hands up can be 100%. The data shows that the proposed algorithm used to recognize abnormal human multipose motion behavior and the actual abnormal human multipose motion behavior is very small; it shows that the proposed algorithm has high effectiveness in recognizing abnormal human multipose motion behavior.

The proposed algorithm randomly divides one of the images and detects the result of human motion targets, as shown in Figure 3.

According to Figure 3, the proposed algorithm can achieve accurate detection of human motion targets in multipose human motion images. The proposed algorithm has a high level of image segmentation. It can accurately extract human motion targets through effective image segmentation ability, which provides the basis for accurate recognition of abnormal motion behavior.

The proposed algorithm is compared with the algorithm in literature [1014]. The comparison results of the accuracy and recall of the six algorithms for the abnormal recognition of multipose motion behavior in the experimental data set are shown in Figures 4 and 5. The squatting posture is selected for the same posture test.

According to Figures 4 and 5, under different postures, the accuracy of the proposed algorithm is always higher than 99%. For the abnormal behavior recognition of the same posture, the accuracy always fluctuates between 99% and 100%. In contrast, in the process of different posture recognition, the recognition accuracy of literature [10] algorithm, literature [11] algorithm, literature [12] algorithm, literature [13] algorithm, and literature [14] algorithm fluctuates between 90% and 98.5%, and the recognition accuracy of the same posture is always lower than 96%. The recognition accuracy of the proposed algorithm is significantly higher than that of the other five algorithms. In addition, the recall rate of the proposed algorithm for recognizing multipose and the same posture human motion behavior abnormality is higher than 99%, and the recall rate of literature algorithm for recognizing multipose human motion behavior abnormality is lower than the proposed algorithm. The comparison results verify that the proposed algorithm has high performance of human motion behavior anomaly recognition. The proposed algorithm effectively segments the multipose human motion behavior image, selects the GAN, and uses the segmented image to realize the accurate recognition of multipose human motion behavior anomaly.

In order to further measure the recognition performance of this algorithm, value is selected as the test index to measure the recognition of abnormal human multipose motion behavior. Six algorithms are used to recognize the value of abnormal human multipose motion behavior in two data sets. The value is affected by the accuracy rate and recall rate. The higher the accuracy rate is, the higher the recall rate is, and the better the index is. The statistical results of value indexes are shown in Figure 6.

As can be seen in Figure 6, the value of the proposed algorithm to recognize the abnormal multipose motion behavior of humans is higher than that of the other five algorithms. It is verified again that the proposed algorithm has high recognition performance. This is because this algorithm uses Gaussian model to segment human multipose motion behavior image, and the image foreground of the segmentation result effectively reduces the difficulty of image segmentation, improves the recognition accuracy of human posture behavior, and makes value higher.

The real-time performance of the recognition algorithm is very important. Six algorithms are used to recognize the recognition time of abnormal human multipose motion behavior. The statistical results are shown in Figure 7.

As can be seen in Figure 7, the average recognition time of using the proposed algorithm to recognize human multipose motion behavior abnormalities is less than 200 ms, which verifies that the proposed algorithm has high recognition real-time performance. This is because the proposed algorithm selects the Shi-Tomasi algorithm to extract the contour feature points in the human motion image from the human motion target detection results. Based on this, the extracted contour features are set as hidden space random variables and input into the GAN to determine whether the multipose human motion behavior is abnormal. GAN is a very mature technology for enhancing image quality and has application advantages. The proposed algorithm has a wide range of applications, and the real-time performance of the algorithm is very important. The proposed algorithm has high real-time recognition performance, which can improve the application performance of the algorithm.

5. Conclusions

GAN is applied to human multipose motion behavior anomaly recognition. Firstly, the multipose human motion image is segmented to obtain the human motion target area in the image, and the human motion contour features are extracted from the human motion target area. Based on the GAN, the extracted features are used to realize the abnormal recognition of human multipose motion behavior. Experiments show that the recognition results of human multipose motion behavior abnormalities have high accuracy, and the proposed algorithm has high feasibility. The proposed algorithm can not only realize the abnormal recognition of human multipose motion behavior and ensure the recognition accuracy but also has the advantage of high real-time recognition. The proposed algorithm solves the defect of low recognition performance caused by too few sample data in the past.

However, for specific occasions, the multipose behavior of humans is greatly affected by light, resulting in great difficulty in image acquisition, and the collected image is prone to distortion. In the practical application of human behavior recognition method, the combination of light compensation technology and motion target detection algorithm to enhance the effectiveness of data set still needs further research.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest with any financial organizations regarding the material reported in this manuscript.