Abstract

With the development of the live broadcast industry, security issues in the live broadcast process have become increasingly apparent. At present, the supervision of various live broadcast platforms is basically in a state of human supervision. Manpower supervision is mainly through user reporting and platform supervision measures. However, there are a large number of live broadcast rooms at the same time, and only relying on human supervision can no longer meet the monitoring needs of live broadcasts. Based on this situation, this study proposes a violation information recognition method of a live-broadcasting platform based on machine learning technology. By analyzing the similarities and differences between normal live broadcasts and violation live broadcasts, combined with the characteristics of violation image data, this study mainly detects human skin color and sensitive parts. A prominent feature of violation images is that they contain a large area of naked skin, and the ratio of the area of naked skin to the overall image area of the violation image will exceed the threshold. Skin color recognition plays a role in initial target positioning. The accuracy of skin color recognition is directly related to the recognition accuracy of the entire system, so skin color recognition is the most important part of violation information recognition. Although there are many effective skin color recognition technologies, the accuracy and stability of skin color recognition still need to be improved due to the influence of various external factors, such as light intensity, light source color, and physical equipment. When it is detected that the area of the skin color in the live screen exceeds the threshold, it is preliminarily determined to be a suspected violation video. In order to improve the recognition accuracy, it is necessary to detect sensitive parts of the suspected video. Naked female breasts are a very obvious feature in violation images. This study uses a chest feature extraction method to detect the chest in the image. When the recognition result is a violation image, it is determined that the live broadcast involves violation content. The machine learning algorithm is simple to implement, and the parameters are easy to adjust. The classifier training requires a short time and is suitable for live violation information recognition scenarios. The experimental results on the adopted data set show that the method used in this article can effectively detect videos with violation content. The recognition rate is as high as 85.98%, which is suitable for a real-life environment and has good practical significance.

1. Introduction

In recent years, the webcast industry has developed rapidly, and it has become the main channel of entertainment for the majority of netizens. The blowout-style webcast also brought chaos in the live broadcast. Some webcast platforms lack a sense of responsibility and do not have full-time platform content supervisors. The anchors of the live broadcast platform also lack self-discipline and have a shallow legal awareness, which has caused an endless stream of undesirable phenomena such as online violence and online obscenity. During the live broadcast, there are phenomena such as pornography, gambling, drugs, and sneak shots, which seriously endanger the physical and mental health of netizens. At the same time, this has also caused a serious negative impact on the good development of society. Due to the large number of live broadcast platforms, the wide coverage of content, and the inconsistent live broadcast time, it is obviously incapable of relying solely on network police to monitor. Therefore, intelligent violation information recognition technology needs to be introduced into the field of webcasting. Currently, violation information recognition is mainly carried out from the following aspects. One is skin color recognition. The core principle is to determine that the image contains violation content after detecting that the area of the skin in the screen exceeds a prescribed threshold. When multiple consecutive images have the same situation, it is determined that the video is a violation video. Reference [1] used skin color segmentation and geometric features of human pose to identify sensitive images. The basic idea is to treat the human body as a combination of several columnar regions according to certain rules. However, its disadvantages are that it can only process a single type of image, lack of adaptability and versatility, slow processing speed, and low image recognition rate. Reference [2] proposed the use of skin color recognition, texture analysis, and feature vector classification to identify sensitive pictures. Since a simple skin color model is used, the judgment result is largely dependent on the result of skin region extraction, which has great limitations. Reference [3] uses key frames to build an XYZ model. The recognition of a single frame is taken as the axis, the framing is the axis, and the and axes are parameters to construct the axis to determine whether the frame contains sensitive images. Many subsequent studies have focused on improving the recognition rate of human skin color [46]. The second is human behavior recognition. Human behavior recognition is to process time-varying data, that is, to select the parameters used to describe the static posture of the human body from a series of key frames in a time sequence obtained from the image sequence, connect them to a set of parameters, and match them with the predefined action template, so that the computer can describe the behavior of the human body in natural language. There are two common human behavior recognition technologies in videos: (1) A method based on model matching. Yuliie et al. proposed a method based on flexible templates to extract facial features. Sakai et al. used submodels of facial features and face contours to detect frontal faces in images. Craw et al. proposed a recognition method based on frontal faces. First, extract the edges, then connect the edges, detect the face template library according to the relevant conditions, and finally use the same method and different sizes to search for the facial features. (2) A method based on state space. Reference [7] proposed a method of describing behaviors with optical flow direction histograms, which opened up a new way for human behavior recognition. The third is based on the recognition of sensitive parts. Sensitive parts mainly refer to parts such as breasts. This method is mainly reflected in the training of the recognition model. Commonly used recognition models mainly include support vector machine (SVM) [89], AdaBoost model [1012], and neural network [1315]. The fourth is the recognition technology based on the human face. The main part of violation images is undoubtedly the human body, and the human face is also an important part of the human body. In violation images, except for individual violation images which are closed-up images of sensitive parts, most of them contain facial information. Therefore, accurate face recognition can also be used as one of the judging features of violation images. Face recognition is mainly divided into three categories: (1) Methods based on skin color [1617]. Similar to skin color recognition, the judgment is based on the difference between the color of the face skin and the background. This method requires relatively high color. (2) Feature-based methods [1820]. It is also judged based on all the texture features of the face. This method is less sensitive to color, and at the same time, it is more robust to the recognition of various skin colors. (3) Methods based on statistics [2122]. Count the face image data and then design a classifier to classify and judge the images to be detected.

It can be seen from the above research that most types of violation recognition need to design a classifier with good performance. Currently, classifiers mainly include two types based on machine learning and deep learning. In the field of machine learning, the commonly used classification models mainly include the Gaussian model [2324], SVM [2526], AdaBoost [2728], and fuzzy system [2930]. The use of machine learning algorithms often needs to be equipped with suitable feature extraction methods to get the desired results. Commonly used feature extraction methods mainly include wavelet decomposition and principal component analysis [3132]. In the field of deep learning, the commonly used classification models mainly include convolutional neural networks [3334], cyclic neural networks [3536], and long- and short-term memory neural networks [37]. This article proposes a violation information recognition method based on machine learning for the scene of a live webcast. The method can realize real-time rapid recognition and is suitable for the field of webcasting. The main work of this study is summarized as follows: (1)For the recognition of violation information in live broadcasts, skin color recognition is first used for preliminary filtering. Specifically, because the YCbCr color space is simple and it is easy to separate the brightness components in the image, this study uses the Gaussian model to train in the YCbCr color space to obtain a skin color recognition model. According to the trained Bayesian YCbCr skin color model, the test images to be tested are tested. The test image is converted from RGB space to YCbCr space, and its YCbCr component is taken. This component is input into the Bayesian classification model, and the skin color segmentation result can be obtained(2)Due to the error in the skin color recognition result, in order to improve the recognition accuracy, a second round of recognition is performed on the suspected video data. Specifically, Haar-like features are extracted as female breast recognition features. Haar-like features are used to describe the gray features of local areas. In the training window of pixels, train and select a weak classifier with strong classification ability. Each weak classifier is combined into a strong classifier according to a certain method, and then, a cascade classifier is obtained by training. Finally, the cascade classifier obtained by training is used to detect the test samples with a pixel sliding window(3)In order to verify the recognition effect of the method in this study on the violation content of the live broadcast platform, the data set is used for experimental comparison. In order to illustrate the effectiveness of the two stages of recognition in this article, the experiment compares the recognition of skin color only, the recognition of sensitive parts, and the cascade recognition of the two. In order to verify the applicability and superiority of the classification model selected at each stage, the experimental part also made a comparative analysis of the classification model. The experimental results show that the method in this study has a good recognition effect and practical value

2.1. Violation Information Recognition Based on Skin Color

Violation image recognition belongs to image recognition. Due to the complex image background and the susceptibility to illumination, as well as the diversity of the target human body’s gestures, it is difficult to use a single model to represent the characteristics of violation images. A prominent feature of violation images is the large area of bare skin. Various studies have shown that the basis of violation image recognition technology is the extraction of the skin area in the image. Firstly, it is judged whether the pixels in the image are skin pixels; on this basis, the next step of judgment and recognition is carried out, and finally, the whole image is judged.

Skin color recognition technology is mainly used in face recognition, human body recognition, and violation image identification. For violation images, the most important thing is to detect the naked human skin in the image. Extensive nudity is the most important feature of violation images. The ratio of the naked skin area of the violation image to the overall image area must exceed a certain threshold. The current mainstream violation image recognition technologies are based on skin color recognition, and subsequent analysis and judgment are made on this basis. Skin color recognition has played a role in locating the target initially. The accuracy of skin color recognition is directly related to the recognition accuracy of the entire system, so skin color recognition is the most important part of the entire violation content recognition system. Many researchers have discussed the skin color recognition technology. Due to the influence of various external factors, such as light intensity, light source color, and physical equipment, it is difficult to determine skin color simply and accurately. So far, there is no stable and effective method to determine the part of the skin in the image. Most of the current research is devoted to improving the accuracy and stability of the algorithm. The violation video recognition process based on skin color is shown in Figure 1:

As shown in Figure 1, the collected live video is first decomposed into one frame by frame image. Second, the skin color feature of the image is extracted. Third, the skin area is calculated. When the area is greater than the set threshold, the image is determined to be an abnormal image. When the area is less than the threshold, the image is determined to be a normal image. Finally, when multiple frames of continuous images are judged to be abnormal images, the video is considered to contain violation content.

2.2. Violation Information Recognition Based on Human Face

Studies have shown that the proportion of human faces in normal images to skin color is much greater than that in violation images. Therefore, after adding face recognition to judge the proportion of skin color occupied by human faces, it can reduce the false recognition rate of normal images and basically does not affect the judgment of violation images. There are many researches on face recognition, but each research has its shortcomings. At present, the most practically used face recognition method is based on the AdaBoost algorithm. This article uses Viola’s face recognition method, which is based on a cascade detector. The face recognition process based on cascaded detectors is shown in Figure 2.

As shown in Figure 2, the Haar-like feature is first used to characterize the face. In order to accelerate the calculation of the feature value, the “integral graph” technology is applied in the calculation process. Second, a series of weak classifiers that are most suitable for face determination are selected through the AdaBoost algorithm. Combine multiple weak classifiers into one strong classifier based on voting. Finally, multiple strong classifiers are connected in series to form an efficient stacked classifier with a cascade structure.

The general operation steps for specific implementation are as follows: (1)Extract features from a picture(2)Thousands of cut face pictures and tens of thousands of background pictures are used as training samples. Usually, the training images are uniformly scaled to a square. At this size, the number of Haar features that can be used is about 10,000. Then, use the AdaBoost algorithm to select thousands of effective features from these tens of thousands of features to construct a classifier(3)After the face detector has been trained, it can be used. Before recognition, the image is scaled according to a certain ratio. On the zoomed picture, a subwindow is used to sequentially determine whether it is a human face or a nonhuman face

2.3. Violation Information Recognition Based on Sensitive Parts

Nudity is an important visual feature of violation images, and the recognition of violation shots in sensitive parts is more specific. According to the texture characteristics of sensitive parts, a classifier of sensitive parts is trained to detect pornography, which can detect violation content more accurately. The training process of the sensitive part classifier refers to the process of training the classifier of the AdaBoost algorithm in face recognition. The core steps of the sensitive part recognition based on AdaBoost learning are as follows: First, the binary image extracted by skin color recognition, face recognition, and connected domains is segmented to obtain the region to be inspected. Then, the sensitive part classifier recognition of the sliding window is performed, and if it fails, it is judged as a normal image. Then, perform the areola classifier recognition of the sliding window; if it fails, it is still judged as a normal video. If it passes, it is judged as a suspected violation image. The process is shown in Figure 3, where breast classifier recognition and areola classifier recognition are the core of the method.

3. Violation Information Recognition on Live Video

3.1. Recognition Framework

Figure 4 shows the violation information recognition framework used in this article. First, analyze and preprocess the input image. This work is mainly to pave the way for subsequent image analysis and individual feature recognition. The preprocessed image is subjected to the skin color recognition in the first step, and the skin color recognition result is obtained. When the result is a suspected violation image, the image is input to the second stage of sensitive part recognition. When the recognition result of the sensitive part is a violation image, the image is determined to be a violation image. When multiple adjacent frames of images are determined to be violation images, it is determined that the live video contains violation content.

3.2. Bayesian Skin Color Recognition

The skin color recognition process based on Bayesian is shown in Figure 5. First, the skin color part and the nonskin color part are extracted manually from each sample image. Second, the probability map of the skin color and nonskin color parts of the sample image is calculated. After manual segmentation, the histogram statistics are performed on the CbCr component and normalized. In the statistical process of the probability map, the RGB value of each pixel in the skin color part and the nonskin color part is first converted to the YCbCr space. Then, accumulate the skin color and nonskin color of the CbCr components and normalize them. Two-dimensional probability statistics of skin color and nonskin color of the CbCr component are obtained. Finally, perform Bayesian classification calculation on CbCr and obtain the Bayesian classification model of skin color on the CbCr component.

This study chooses YCbCr as the color space for skin color recognition. The first reason is that YCbCr is widely used in many vision technologies. YCbCr has a composition principle similar to the human visual perception process and can be directly applied to image formats such as JPEG or MPEG without having to convert it into other color spaces. The second reason is that the YCbCr color space has the advantage of separating the brightness components in the image. The distribution of skin pixels on the YCbCr channel is relatively concentrated and has good cohesion. The third reason is that the YCbCr color space coordinate representation is simpler than other color spaces.

Bayesian method [38] is a method used to solve statistical problems based on Bayesian theory. It can well connect the prior probability and posterior probability of an event. For skin color recognition, the unknown sample is first classified into a skin color category or a nonskin color category . Let denote the cost of at being designated as . When , the classification is correct, that is, it is correctly detected whether the is a skin pixel. At this time, is the cost of correct classification. When , it means that the classification is wrong, that is, the skin color point is detected as the nonskin color part, or the nonskin color part is falsely detected as the skin color. At this time, is the cost of classification when the error is detected. Let the variable be the total cost borne by the sample to be determined as the type. According to the above assumptions, we can get

The represents the conditional probability that the unknown sample belongs to , which stipulates

It can be obtained by the above formula:

Finally, import the Bayesian formula to get

After sorting, the following formula is obtained:

where

The key of this method is to determine an optimal threshold to achieve accurate skin color recognition. When , this type of CbCr is a skin color point, and vice versa, it is a nonskin color point. According to reference [39], the optimal value of is between 2 and 4. This article uses a single-point skin color recognition method. Extract the color information of CbCr from a single point, and then, make a decision based on the statistical histogram information. This method achieves better results and is faster at the same time.

3.3. Sensitive Part Recognition Based on AdaBoost

The Haar-like feature is extracted as the main feature for female breast recognition. The Haar-like feature is used to describe the gray features of local areas. This breast feature extraction method easily misjudges the eyes, belly button, and other parts as breasts. In order to avoid this problem as much as possible, the red color of the chest color is considered stronger. Let the color characteristic value of the chest be . is the information difference between the R channel and the B and G channels. is the information difference between the B and G channels. The calculation formulas of and are as follows:

where , , and , respectively, represent the component information value of the pixel on the Red, Green, and Black (RGB) channel. The calculation formula for the average eigenvalue of the target area is as follows:

The is the number of pixels in the target area. When is greater than the set threshold, it can be considered that the red color feature here is large, and misrecognition areas such as the eyes and belly buttons can be excluded. At the same time, the calculated area is the target area for classification, and the area is small, so the calculation process will not affect the speed.

In a training window of pixels, a weak classifier with strong classification ability is trained. According to a certain method, a strong classifier is combined into a strong classifier, and then, a cascaded classifier is trained. Finally, the trained cascade classifier and a pixel sliding window are used to detect the test samples. Figure 6 shows the structure of the AdaBoost cascade classifier.

4. Experiment and Analysis

4.1. Experimental Data

The experimental data set adopted in this article is a public video data set. The specific representation of this data set is shown in Table 1. Figure 7 is an example of this adopted data set.

4.2. Experimental Environment

To analyze the performance of each method, the evaluation indicators used in this study are precision, recall, and F1-measure (F1). The calculation formula of each evaluation index is as follows:

where stands for the number of positive classes predicted as positive classes, stands for negative class predictions as negative classes, stands for negative class predictions as positive classes, and stands for positive class predictions as negative classes. The in Equation (17) is the weight value. When the value of is greater than 1, the accuracy rate is more important, and when the value of is less than 1, the recall rate is more important. In this study, the value of is set to 1, indicating that the requirements for precision and recall are the same. The values of the above three indicators are all between 0 and 1. The closer the value is to 1, the better the classification effect of the model.

The hardware configuration information used in this experiment is as follows: CPU is Intel Core i7, graphics memory is GTX960M 4G, and memory is 16G. The operating system is Windows10 64-bit, and the development language is MATLAB.

4.3. Experimental Results and Analysis

The experiment mainly conducts comparative analysis from three aspects. One is to perform skin color recognition only, the other is to detect sensitive parts only, and the third is to combine the two recognition methods.

4.3.1. Skin Color Recognition Experiment

To compare the effectiveness of the Bayesian model for skin color recognition in this study, the comparison model uses artificial neural networks (ANN) [40] and SVM [41]. The experimental results are shown in Table 2:

It can be seen from the experimental results that the skin color recognition gap based on the three models is not large. The Bayesian classification model based on YCbCr space has achieved the best recognition effect. Because the distribution of skin pixels on the CbCr component has good cohesion, the Bayesian classification model based on YCbCr space can achieve better skin color recognition results. This method is simple and fast. The classification result based on the Bayesian criterion is also more accurate. The experimental results in this study are in line with expectations. However, this method strongly depends on the training sample images, and the comprehensive and reasonable selection of the sample library has a greater impact on the results. At the same time, this method is a static segmentation method. After the classifier is determined, the segmentation threshold of the test image cannot be flexibly changed. Therefore, if the judgment condition is adjusted, when the judgment condition for skin color is strengthened, the recognition result produced by the trained classifier will make it difficult to detect part of the skin color. And when the skin color judgment condition is relaxed, some background points will be mistakenly detected as skin color points.

4.3.2. Sensitive Part Recognition Experiment

Sensitive part recognition is performed on the original data. The comparison models include the SVM and TSK fuzzy systems [42]. The results of sensitive part recognition by each model are shown in Table 3.

The data in Table 3 shows that the recognition effect of sensitive parts based on the AdaBoost model is obviously better. The Haar feature classifier has a high positive recognition rate and a high false recognition rate. Many nontarget areas are also mistaken for sensitive parts. Appropriately increasing the number of classifier stages can reduce the false recognition rate, but the false recognition rate is still high. At the same time, an increase in the number of classifier stages will also lead to an increase in computing time. Compared with face recognition, the characteristics of the areola are not obvious, except that the color of the center of the area is darker than the surrounding area, and there is no more obvious feature available. This is why the overall accuracy of recognition of sensitive parts is lower than that of skin color recognition.

4.3.3. Recognition Experiment Based on the Proposed Framework

To improve the recognition accuracy, this research proposes to use multistep recognition. First, the original sample is input to the skin color recognition module, and the suspected violation sample is detected. Second, input the suspected sample to the sensitive part recognition module. The test results of the framework proposed in this study are shown in Table 4. It can be seen from the experimental results that the framework proposed in this article greatly improves the recognition accuracy of violation content. This is because the recognition of multiple links can reduce the false recognition rate.

5. Conclusion

Aiming at the recognition of violation content in live broadcast scenes, this study proposes a recognition framework based on skin color and sensitive parts. In the skin color recognition link, a Bayesian recognition method based on the YCbCr space is used. However, relying only on skin color recognition results as the only criterion for violation content recognition cannot meet the needs of practical applications. In the judgment of images such as wearing sexy celebrity portraits or avatar ID photos, since such pictures also have a large amount of naked skin, the judgment will have a large false recognition rate. Therefore, a sensitive part recognition link is added. This session uses an AdaBoost-based recognition method for bare breasts. The experimental results show that the recognition framework proposed in this study can effectively detect whether the live broadcast scene contains violation content. This research has very good practical value. To shorten the recognition time, this research omitted the face recognition link. But at the same time, the false recognition rate is increased. The next step is to improve the recognition accuracy without increasing the recognition time as much as possible. This research intends to introduce deep learning to extract deep features of images to improve recognition efficiency.

Data Availability

The labeled data set used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported in part by the National Social Science Fund Youth Project under Grant 17BGL102, Excellent Project of Jiangsu Province Social Science Union under Grant 15SYC-043, Soft Science Research of Wuxi Science and Technology Association under Grant KX15-B-01, and Fundamental Research Funds for the Central Universities under Grant 2015ZX18.