Harmful Content Detection Based on Cascaded Adaptive Boosting
Recently, it has become very easy to acquire various types of image contents through mobile devices with high-performance visual sensors. However, harmful image contents such as nude pictures and videos are also distributed and spread easily. Therefore, various methods for effectively detecting and filtering such image contents are being introduced continuously. In this paper, we propose a new approach to robustly detect the human navel area, which is an element representing the harmfulness of the image, using Haar-like features and a cascaded AdaBoost algorithm. In the proposed method, the nipple area of a human is detected first using the color information from the input image and the candidate navel regions are detected using positional information relative to the detected nipple area. Nonnavel areas are then removed from the candidate navel regions and only the actual navel areas are robustly detected through filtering using the Haar-like feature and the cascaded AdaBoost algorithm. The experimental results show that the proposed method extracts nipple and navel areas more precisely than the conventional method. The proposed navel area detection algorithm is expected to be used effectively in various applications related to the detection of harmful contents.
Recently, various types of image media can be easily acquired and used conveniently via the Internet anytime and anywhere due to the rapid spread of devices equipped with various kinds of high-performance visual sensors and with high-speed wired and wireless network functions [1–4]. These visual media are applied and used in various practical applications such as information and communication, broadcasting, medical care, education and training, augmented reality, transportation, military, game, animation, virtual reality, CAD, and industrial technology [5–8].
However, the Internet has indiscriminately provided users with image content that exposes personal information requiring social control, thereby producing adverse effects [9, 10]. In other words, the image content containing personal information or exposed body parts are exposed and distributed to ordinary people who lack judgment and self-control including children and adolescents without any sanction, which not only violates personal privacy, but also becomes a social problem.
Furthermore, if the video content containing exposed critical body parts is disclosed to the public, the mental damage and side effects to the persons concerned are expected to be beyond imagination. In recent years, therefore, attention has been focused on the research on robust detection of harmful content that exposes important human body parts using image processing and computer vision techniques and effective filtering of the detected harmful content .
Conventional studies for automatically detecting harmful content from various types of input images can be found in related references. In , a skin color classifier is used first to detect a skin color region from an input image. A region feature classifier is then used to classify input images into harmless or harmful by checking the ratio of the skin color region detected previously and the position feature. In , harmful contents are detected in three stages. In the first stage, image of interest (IOI) is extracted from the background. In the second stage, a weight mask is used to distinguish the IOI from the segmented image. Finally, the color and texture features of the IOI are compared to determine whether it is harmful or not. In , harmful images are detected using the image retrieval technique from the image database composed of harmful images and nonharmful images. In this method, if the number of harmful images included in the retrieved images is more than the predetermined number, the query image is determined as a harmful image. In , the input image is analyzed and it is detected whether an important component of the exposed human body such as nipples is included in the image to determine whether the image is harmful. In addition to the algorithms described above, various methods have been proposed to extract harmful contents more robustly .
Although many of the methods described above detect harmful images reliably to some extent, there are still many limitations. The existing methods for detecting harmful image contents still include false detections. Therefore, this paper proposes a new method of robustly detecting the human navel area using Haar-like features and an AdaBoost algorithm for use in harmful image detection. Figure 1 shows a general outline of the navel detection method proposed in this paper.
As shown in Figure 1, the proposed method first detects the nipple area of a woman by using color information from color images, then it detects candidate navel areas using location information of the detected nipple area, edge images, and saturation images. The nonnavel area is then removed and only the actual navel area is robustly detected through filtering using the Haar-like feature and the AdaBoost algorithm.
The rest of this paper is organized as follows. In Section 2, we describe various existing studies related to harmful image detection used in the fields of image processing and multimedia. Section 3 describes a method for extracting candidate navel regions using color information, edge images, and saturation images. In Section 4, we explain how to remove nonnavel areas from candidate navel regions using Haar-like features and the AdaBoost algorithm and filter only the actual navel areas. In Section 5, experimental results are presented to compare and evaluate the performance of the proposed adult image detection method. Section 6 describes the conclusion of this paper and the future research direction.
2. Related Work
It has become very easy to acquire various kinds of multimedia contents as mobile smart devices spread, the performance of the central processing unit improves, the mass storage devices become cheaper, and the high-speed wired and wireless network technology develops. On the other hand, harmful contents such as naked photographs or adult images are freely distributed via the Internet, which is a social problem. Therefore, there has been a need for a technique for effectively detecting and filtering adult images in recent years. The following are various existing methods related to the detection and filtering of adult images introduced in the related literature. A harmful image classification tool that uses skin color distribution of a human was developed through the method of using a skin color classifier . In this method, the classification tool consists of two stages. In the first stage, the skin color distribution region is detected from the input image using the skin color classifier. In the second stage, the ratio of the skin color region and the positional feature detected earlier are classified into harmless or harmful by using the region feature classifier. The skin color classifier learns the RGB color values of harmless images and harmful images based on the histogram model, and the region feature classifier learns skin color ratios of the 29 areas of the image based on SVM (support vector machine). The advantage of this method is that the algorithm is relatively simple. However, it has a disadvantage due to a limitation in detecting all the adult images photographed in various environments only with a human skin color distribution model.
The method of using color and texture features  detects harmful contents in three stages. In the first stage, IOI is extracted from the background of the input image. In the second stage, the extracted IOI is separated from the segmented image using a new weighted mask. It is then determined whether to accept the IOI or not. In the third stage, the color and texture feature values of the region of interest or the original image are confirmed. If the feature value is greater than a predetermined threshold, the input image is determined to be an adult image. Otherwise, it is determined to be a nonadult image. In this method, ROC (receiver operating characteristic) curve analysis is performed to optimally extract predefined thresholds. In addition, the texture features were detected using a gray level cooccurrence matrix. This method has a disadvantage since the accuracy is lowered when the IOI is not accurately extracted.
The method of employing the image retrieval technique  first produces the image database by using a certain number of harmful and nonharmful images. Then, when a query image is given, 100 images which are most similar to the query image are retrieved in the prebuilt image database using color, texture, and shape features. If the number of harmful images included in the retrieved image is equal to or greater than a preset value, the input query image is judged to be a harmful image. Otherwise, it is judged to be a nonharmful image. This method may produce good results in certain situations, but it has a disadvantage since the performance of the entire system may be affected by the contents of the image database created in advance.
A method of using the components of the human body  was proposed that automatically determines the presence or absence of harmfulness of an input image using an artificial neural network. In this method, the human face region is first detected based on the MCT feature from the input image. Then, the color features are used to find a human skin color area, and candidate regions of the nipple, which is one of the components of the body representing harmfulness, are extracted. Finally, the hierarchical artificial neural network is used to remove nonnipple areas from the extracted candidate nipple areas and only the actual nipple area is selected to check the presence or absence of the harmfulness of the input image. This method has the disadvantage of being difficult to use in an adult image without a nipple area, although it can detect the nipple area in the adult image where the nipple area of the woman is clear.
In addition to the various methods described above, many new methods related to the detection of adult images are still being introduced . Most of these methods still use a human skin color distribution model. However, existing methods using the skin color distribution as a main feature have the advantage of simplicity, while there are some limitations to robust detection of adult images using only a skin region from images taken in various environments. In addition, the existing methods of detecting harmful images still include many false positives.
3. Extraction of Candidate Regions
3.1. Detection of Candidate Nipple Regions
This paper first analyzes the input color image to detect human skin color areas. In other words, to robustly detect a human skin area, a skin color distribution model is assumed to be an oval distribution model in the color space [17, 18], and the skin color pixels are determined to extract the various parameters of the oval distribution model such as the length of the major axis and minor axis. Equation (1) is the oval skin color distribution model used in this paper.
In (1), represents the value and the value of the input test image, and represents the rotation angle of an oval. means the major axis and minor axis of the oval, and is the value that compensates for the rotational error of the oval. (,) indicates the center of the skin color distribution model. is an index representing each skin sample, and is a factor representing the number of selected skin samples. In this paper, the values of (radian), , , , and were used in (1).
In this paper, we use the learned oval skin color distribution model to exclude other regions from the test image and segment only the human skin color region using (2). In (2), represents the color value at the position of the input image, and represents the color values included in the predefined oval skin color distribution model.
As shown in Table 1, the skin region is characterized by a lower average value and a higher value than the nipple region in the color space, but considering the standard deviation of the two regions, both the and values are within the error range. The mean and standard deviation of individual skin color and nipple color differences show that there is a clear difference in color between the two areas within the individual.
In this paper, after extracting the skin color region from the image, we extract candidate nipple regions by applying a nipple map defined as (3) to the detected skin color region. In this paper, we derive the nipple map from the fact that the nipple area of a normal person has a red color and has a relatively dark brightness value.
In (3), , , and are the values of the color model components at the image position. All terms in (3) are normalized to values between 0 and 255 for convenience of calculation. is a term for emphasizing the nipple region using general nipple color information, and is a term to emphasize the nipple area relative to the skin area. The nipple map images created in this paper are displayed with bright contrast values for pixels with a higher probability of being a nipple and dark contrast values for pixels with a lower probability of being a nipple. The candidate nipple regions are detected primarily by performing Otsu’s adaptive binarization [19–21] and labeling [22, 23] in the area detected by applying the nipple map.
The candidate nipple regions that are primarily detected by applying the nipple map are filtered using geometric features. As shown in (4), the size feature, extension feature, and density feature of the candidate nipple region are used as the geometric features. In (4), represents the candidate region of the th nipple, and represent the length and width of the image, and is the number of pixels included in the candidate region . and show the horizontal and vertical length of the minimum enclosing rectangle () of the candidate nipple region . The size feature indicates the total number of pixels occupied by the candidate nipple region, and the dense feature indicates the relative ratio of the area occupied by the candidate region to the area of the candidate region. The extension feature represents the relative ratio of the horizontal length to the vertical length of the candidate region. In this paper, if the size feature, density feature, and extension feature of the candidate region are smaller than the predefined threshold value, the candidate region is judged to be the nonnipple region and excluded from the candidate nipple regions.
3.2. Detection of Candidate Navel Regions
The candidate navel regions are extracted using the edge, the saturation, and the positional coordinates relative to the detected nipple regions. First, a Sobel edge is detected from the input image. The Sobel edge operator is the first differential operator that detects the edge using the intensity of the gradient and is differentiated once for the and axes. Figure 2 shows the horizontal and vertical masks corresponding to the Sobel edge operator used in this paper.
(a) 3 × 3 mask type
(b) Horizontal mask
(c) Vertical mask
Otsu’s binarization algorithm is used to binarize the detected edge. A closing morphological operator [24–26] is applied to the binarized edge image. Usually, a closing morphological operator plays a role of filling holes existing inside an object while keeping the basic shape of an object, or combining adjacent disconnected areas.
A saturation image is then extracted from the input image. Generally, the navel area is characterized by relatively high distributed saturation although there are not many remarkable features compared to other body regions. Therefore, this paper extracts saturation representing the turbidity of the color to use it as the feature of the navel area. To this end, the RGB color space of the input image is transformed into the HSV color space using (5) and saturation is then obtained using the channel.
In this paper, we extract the saturation image from the input image and then binarize it using Otsu’s algorithm like the edge image and apply a closing morphological operator. We select regions that are over a certain size from the edge image and the saturation image extracted above, overlapping each other as shown in (6). In (6), and represent the edge image and the saturation image, respectively. indicates an image in which the extracted edge image and the saturation image are overlapped with each other.
The candidate navel regions are then selected using the positional relationship with the already detected nipple regions. In other words, we select the areas adjacent to the straight line perpendicularly bisecting the line segment connecting the centers of the two detected nipple areas as candidate navel areas. Figure 3 shows an outline of the process of selecting the candidate navel areas.
4. Detection of Harmful Images Using Learning
4.1. Extraction of Features for Candidate Regions
In this paper, we used extended Haar-like features as shown in Figure 4 for learning and recognition of the navel area [27–29]. Haar-like features were first introduced to face detection. This is a feature of the square shape, and the feature value is expressed as the difference between the sum of all pixel values in the dark part and the sum of all pixel values in the bright part.
(a) Haar-like features
(b) Navel images
(c) Extended Haar-like features
The Haar-like feature usually takes the form of a rectangle, so an integral can be used for quick calculation. For example, the value of the integral image at the specific position in the original image is the sum of the pixel values from the position (0, 0) to the position . The integral image at position can be defined as (7), where is the pixel value at position .
Generally, it takes some time to change the original image to an integral, but once the integral image is created, the Haar-like feature can be calculated very quickly. The extended Haar-like feature proposed by Chai and Wang allows for a much better recognition rate while significantly expanding the basic set of Haar-like features .
4.2. Filtering Candidate Areas Using Learning
Generally, Boost is an algorithm that builds one strong classifier by combining weak classifiers with somewhat poor performance when calculating results from the data . Here, a weak classifier means a classifier with a classification accuracy of 50% or more, and a strong classifier indicates a classifier with a very small classification error. The advantage of the Boost algorithm is to reduce the probability that the results are incorrectly derived by assumptions and to increase the probability that a problem difficult to solve can be judged in the right direction.
The AdaBoost (adaptive boosting) algorithm [32, 33] is one of the best-known Boost algorithms and has the advantage of being simple and efficient. The use of the AdaBoost learning algorithm makes it possible to acquire more feature values that better express the target object as learning progresses, so that an accurate and robust recognition algorithm can be created. In addition, the AdaBoost learning algorithm has the advantage that it can be used in combination with many other types of learning algorithms to improve performance. Figure 5 shows the overall structure of the AdaBoost learning algorithm.
This paper determines whether the candidate navel region detected in the previous stage through the learning and recognition using the AdaBoost algorithm is a nonnavel area or an actual navel area. The AdaBoost algorithm can be applied to a variety of situations due to the fact that samples misclassified by the previous classifier can be modified by subsequent weak classifiers. Therefore, AdaBoost is vulnerable to noisy data and outliers . In other situations, however, it is less vulnerable to overfitting  than other learning methods. Usually in the AdaBoost algorithm, the final model converges to a stronger classifier if each performance is better than a random estimate even though the performance of the individual classifier is somewhat lower. Pseudocode 1 shows the pseudocode of the overall AdaBoost algorithm.
In this paper, feature values extracted by using the AdaBoost algorithm are grouped by stage as shown in Figure 6. As shown in Figure 6, a stronger recognition algorithm can be implemented through grouping. That is, in the first stage, feature values with a certain level of discrimination ability are grouped with a small number of feature values. In the next stage, a group with a larger number of feature values than the first stage and a discrimination ability similar to the previous stage is created.
In other words, the purpose of using overlapped classifiers is to focus on the classification of more complex images in the remaining stages by filtering out most negative images at the initial stage while maintaining positive images as they are.
5. Experiment Results
The personal computer used for the experiment in this paper consists of a CPU with an Intel Core™ i7-4790 processor at 3.6 Ghz with 8 GB of main memory using Microsoft’s Windows 7 operating system. The proposed method was implemented using Microsoft’s Visual C++ 2015 and OpenCV open source computer vision library. To compare and evaluate the performance of the proposed algorithm, we collected and used several types of harmful and nonharmful images containing exposed navel areas. These are images taken in various indoor and outdoor environments where no specific restrictions are set. In this experimental environment, the average processing time of the proposed harmful image detection system is 312 ms for an input image with a resolution of 1280 × 720 pixels. We plan to improve the processing speed of the proposed detection system by performing code optimization of the implemented program and replacing the specifications of the used personal computer with higher specifications.
In this paper, we applied and used the AdaBoost classifier in a cascaded manner. In other words, we used the cascaded classifier of 4 stages and used 215 positive images and 2354 negative images for learning. The size of the learning image was normalized to the size of 30 × 30 pixels. Table 2 lists the parameters used in the cascaded AdaBoost algorithm.
Figure 7(a) shows an input image, and Figure 7(b) shows the result of segmenting only the human skin color area except for other areas using the oval skin color distribution model already defined from the input test image. Figure 8(a) shows the nipple map image extracted by applying the proposed nipple map based on the , , and color element to the segmented skin color area, and Figure 8(b) shows the result of binarizing the nipple map image by applying the Otsu algorithm.
(a) Input image
(b) Skin area
(a) Nipple map
Figure 9(a) shows the result of detecting only the nipple areas of a human by applying geometric features such as size, density, and extension features to the candidate nipple areas extracted using the nipple map, and Figure 9(b) shows the final extraction of the exposed navel area using the relative positional relationship of body components and the AdaBoost learning algorithm. As can be seen in Figure 9(b), the line connecting the centers of the two extracted nipple areas and the line connecting this line and the center of the navel area form a T shape.
(a) Nipple areas
(b) Navel areas
In this paper, an elliptical skin color distribution model is created by learning a large number of skin pixels included in human skin areas. Therefore, even if the background of the input image is complicated, the skin color extraction function is relatively robust. As a result, there is no problem in the final result of detecting harmful images since nipple and navel regions are obtained from the extracted skin color areas. Naturally, erroneous detection may occur if there are many areas similar to the skin color in the background area. Figure 10 shows an example of robust extraction of skin regions from images with complicated backgrounds. In Figure 10, we can clearly see that our approach extracts skin areas accurately in images with a complex background. Figure 11 shows an example of detecting harmful images in a complicated background.
(a) Input image
(b) Skin areas
In this paper, we use the accuracy measure such as (8) and (9) to quantitatively evaluate the performance of the proposed learning-based navel area extraction method. In (8) and (9), represents the number of correctly detected navel areas, means the number of navel areas which are not the navel areas but are erroneously detected as the navel area, and indicates the number of navel areas that are present but not detected.
In (8), represents the relative ratio of the navel area accurately detected from the entire navel area detected from the input image. In (9), represents the relative ratio of a navel area correctly detected from the entire navel area actually present in the input image.
To compare and evaluate the accuracy of the image harmfulness detection algorithm, we evaluated the method of using the existing skin color distribution, the method of using only the positional relation between nipple and navel regions in the proposed method, and the entire proposed method of using the AdaBoost algorithm. Figures 12 and 13 graphically show the accuracy measurement results of the image harmfulness detection algorithm obtained through (8) and (9). As shown in Figures 12 and 13, it can be seen that the proposed method of detecting the navel area based on AdaBoost learning reduces the false detection of the navel area, thus determining the harmfulness of the image more accurately.
Among the three methods, the existing method of using the skin color distribution has the lowest accuracy and there were many errors because the presence or absence of image harmfulness is detected using only color information without using any part of the human body. In other words, there is a certain limit to detect all harmful images taken in various indoor and outdoor environments only with skin color distribution. The method of using only the positional relation between nipple and navel areas caused a lot of false detection because candidate navel areas have not been correctly verified. However, the method of using the proposed Haar-like feature and cascaded AdaBoost algorithm carried out the verification of the candidate navel area more accurately through AdaBoost learning and recognition algorithms and eliminated a large number of false positives, resulting in higher accuracy.
In the proposed method, when skin color learning data is used in various races, the skin color map operates normally for different races. In the case of images containing sudden light changes or severe noises, skin color map operation may be undesirable, but except for this exceptional situation, the skin color map is robust to race. Therefore, different-colored people in an image are also suitable for the proposed method.
The proposed method detects the nipple area and then detects the candidate navel area using the relative position information of the detected nipple area. Therefore, it is difficult to detect the navel area if the nipple area cannot be detected from the input image. For this case, there is a need for a method of detecting the navel area independently of the detection of the nipple area. That is, there is a need for a method of extracting features representing the navel area, and then directly extracting the navel area based on the extracted features.
Although the proposed method is applied to a case where one person exists in the current image, it is possible to detect harmful images to a certain extent even when two or more people are not overlapped too much with each other in the image. For this purpose, after detecting nipples, we should add the function of clustering the detected nipples with similar features and in close proximity to each other. Subsequently, navels should be detected for each clustered nipples. Figure 14 shows an example of detecting a harmful image with two people. For reference, it is not easy to show the images of the experimental result in the paper because harmful images with more than two people are considerably obscene. Anyway, much further research is needed in the future for the detection of adult images with more than two people.
(a) Nipple map
(b) Detection result image
If the input image includes tattoos, birthmarks, and piercing, it is difficult to detect harmful images. In the case of color tattoos, it does not matter if the color is different from the nipple, but if the colors are similar, the nipple cannot be detected. In the case of birthmarks, it is not a big problem since most spots are covered with color makeup when shooting harmful images. However, if there is an uncovered spot near the nipple area, the color map of the nipple can be used to distinguish the spot and nipple area to some degree. If the spot is located near the navel region, false positive detection may occur. In the case of piercing, the nipple region or the navel region may be divided into small regions due to piercing, and erroneous detection may occur. Therefore, to improve the accuracy of the harmful image detection, it is necessary to analyze the input image to separately detect the tattoos, spots, and piercing, remove them, and then detect the harmful image.
Recently, various types of image media can be easily acquired and used conveniently via the Internet anytime and anywhere due to the rapid spread of smart devices. However, the Internet has indiscriminately provided users with image content that exposes personal information requiring social control, thereby producing adverse effects. Therefore, many attempts have been made to effectively detect harmful image content such as nude pictures using image processing and pattern recognition techniques.
This paper proposed a new method to detect harmful images automatically by robustly extracting the human navel area from the input image using an extended Haar-like feature and cascaded AdaBoost algorithm. In the proposed method, a skin area is first detected from an input image using a predefined oval skin color model. The nipple area of the exposed person is searched using a newly defined nipple map in the skin area of the detected person. The candidate navel regions were then detected using the physical location features with the nipple area. Finally, using the Haar-like feature and the cascaded AdaBoost algorithm, we effectively filtered only the actual navel areas, except for the nonnavel areas, among the candidate areas of the already detected navel. In other words, when the exposed nipple area and the navel area exist in the input image, the input image was judged to be a harmful image.
In the experimental results, we applied the proposed algorithm to various kinds of harmful and nonharmful images taken in a general environment where no specific constraints are given to quantitatively compare and evaluate the accuracy performance. As a result, it was found that the proposed method of detecting the navel area using the Haar-like feature and the cascaded AdaBoost algorithm detects harmful images more robustly than the existing other methods. In other words, the proposed method uses the AdaBoost learning algorithm to conduct verification of candidate navel areas more systematically, so its accuracy is relatively high compared to existing methods that simply detect and utilize only color or a part of the body.
In a future research direction, we plan to verify the robustness of the proposed algorithm by applying the proposed harmful image detection algorithm to indoor and outdoor test images taken in a variety of environments without constraints. We will further stabilize the proposed system by effectively adjusting various parameters including the threshold used in the proposed navel area detection algorithm which has been implemented so far. We also plan to define a new feature that can represent the relatively less distinctive navel area of a human compared to other areas and use this feature.
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this article.
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1A09917838).
Y.-J. Park, S.-H. Weon, J.-K. Sung, H.-I. Choi, and G.-Y. Kim, “Identification of adult images through detection of the breast contour and nipple,” Information - An International Interdisciplinary Journal, vol. 15, no. 7, pp. 2643–2652, 2012.View at: Google Scholar
L. Ma, X.-P. Zhang, J. Si, and G. P. Abousleman, “Bidirectional labeling and registration scheme for grayscale image segmentation,” IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2073–2081, 2005.View at: Google Scholar
R. Su, C. Sun, C. Zhang, and T. D. Pham, “A new method for linear feature and junction enhancement in 2D images based on morphological operation, oriented anisotropic Gaussian function and Hessian information,” Pattern Recognition, vol. 47, no. 10, pp. 3193–3208, 2014.View at: Publisher Site | Google Scholar