Abstract

The Internet has supported diverse types of multimedia content flowing freely on smart phones and tablet PCs based on its easy accessibility. However, multimedia content that can be emotionally harmful for children is also easily spread, causing many social problems. This paper proposes a method to assess the harmfulness of input images automatically based on an artificial neural network. The proposed method first detects human face areas based on the MCT features from the input images. Next, based on color characteristics, this study identifies human skin color areas along with the candidate areas of nipples, one of the human body parts representing harmfulness. Finally, the method removes nonnipple areas among the detected candidate areas using the artificial neural network. The experimental results show that the suggested neural network learning-based method can determine the harmfulness of various types of images more effectively by detecting nipple regions from input images robustly.

1. Introduction

Recently, digital media players with graphic user interface have rapidly developed along with high-speed wired and wireless communication technology, large-scale storage devices, and light and portable mobile devices. Because of this, diverse types of multimedia content such as photographs, animations, and high-definition videos are being spread freely [13]. Thanks to the remarkable growth of computing and networking technologies of mobile devices including smart phones or Internet-using tablets comparable to the existing typical personal computers, mobile device-based multimedia content has been widely used [47].

While anyone can obtain and replay multimedia data through Internet-connected high-speed wired and wireless mobile devices, pornographic videos, naked images, or other adult contents are also spread easily to adolescents and children, causing a huge social problem [8]. In this situation, the need for automatic assessment and filtering of adult images is increasing in the image security area, which inflow intentionally or unintentionally through various routes [9].

Recent literature on image processing and pattern recognition describes existing techniques assessing image harmfulness. Shih et al. searched an image of inquiry from adult and nonadult image database [10]. Among similar search results, if the number of adult images exceeds a certain level, the query image is assessed to be harmful. Zheng et al. used multi-Bayes classifier to identify skin areas and acquired the shape features of extracted skin areas [11]. Next these researchers applied the identified shape features to a boosted classifier to assess image harmfulness. Park et al. detected a breast, including the nipple area, using the Hough transformation in gray images to assess image harmfulness [12]. Lee et al. defined a human skin color model in advance and used the model to extract skin areas from input images [13]. Then, based on the shape features of the extracted skin areas, they assessed image harmfulness. In addition to these methods, new technologies are continuously tried to assess image harmfulness automatically [14].

These existing methods may ensure accuracy to some extent in some image databases. However, the methods’ accuracy is not relatively high enough to take all of diversified types of images captured in different environments. In this situation, we propose a method to assess image harmfulness robustly by detecting human nipple areas by utilizing a hierarchical artificial neural network. In this paper, an image is viewed as being harmful if nipples of a naked woman are detected. Figure 1 shows the overall outline of harmfulness assessment algorithm proposed in this paper.

As shown in Figure 1, the proposed algorithm first detects a human face area from an input image through the modified census transform (MCT) method. Based on color features, the algorithm then identifies human skin color areas and extracts candidate areas of nipple, one of the human body elements. Lastly, using a hierarchical artificial neural network, the algorithm removes nonnipple areas from the extracted candidate nipple areas to remain the actual nipple areas and assess image harmfulness.

The rest of this paper is organized as follows. Section 2 describes existing studies on adult content detection in the image processing and multimedia area. Section 3 explains a method to extract facial elements from input images based on MCT features. Section 4 explains a technique to screen out candidate nipple areas from images by utilizing color information. Section 5 describes a technique to assess image harmfulness by verifying candidate nipple areas with a neural network. Section 6 shows the results of experiment conducted to compare and evaluate the performance of the proposed method. Section 7 presents the study’s conclusions and presents future study expectations.

With the development of mobile devices such as smart phones and Internet-using tablets in their computing performance and network function, adult multimedia content, including naked images or adult videos, is also freely distributed. In this situation, the need for technologies to automatically screen out such adult content is increasing. Relevant literature introduces methods related to automatic detection of adult content as follows.

The content-based image retrieval method [10] first removes the background area from images by utilizing the skin color distribution and acquires areas of interest in a square form. Next, this method extracts color, texture, and shape features from each image and searches the 100 most similar images to the input image from an image database consisting of adult and nonadult images. If the searched images include more than adult images, the given image is viewed as an adult image. If not, the given image is viewed as a nonadult image. is a predefined threshold value. In other words, the image retrieval-based method resolved the issue of adult image screening-out through image classification. Figure 2 shows the overall structure of the image retrieval-based method.

The shape feature-based method [11] utilizes the multi-Bayes classifier [15, 16] to detect areas with human skin color distribution more precisely. For precise skin detection, the human skin color detection procedures consist of two phases: skin pixel detection phase and skin area refinement phase. From the detected skin areas, shape features are extracted and input into the boosted classifier to assess if the input skin area is a naked image or not. In this paper, the shape features were analyzed by using the three simple shape descriptors of eccentricity, compactness, and rectangularity; seven normal moment invariants presented by Hu [17]; and Zernike moments [18, 19]. In this method, different boosted classifiers and shape features were utilized to compare and test the performance of adult image detection algorithm.

The new method based on breast area detection [12] uses the mean intensity filter and Hough transform [20, 21] to identify breast areas from images to screen out adult images. The proposed adult image identification method is comprised largely of the learning phase, recognition phase, and test phase. In the learning phase, by learning breast nipple part images, the system forms a nipple intensity filter to be used in the recognition phase. In the recognition phase, the input image is taken to extract the edges and connection elements are extracted by utilizing the edge density. Next, by considering the length to width ratio of the connection element, the system determines candidate nipple regions. The system measures the similarity between the learned nipple intensity filter and candidate nipple region in the input image to decide that with the highest similarity as the final candidate nipple area. The Hough transform is utilized to detect the breast line in the image. In the test phase, the locations of breast lines and candidate nipple areas learned in the recognition phase are considered to assess the final harmfulness of the corresponding image.

The human skin color model-based method [13] utilizes adaptive and extensible skin color distribution models capable of enduring the color cast caused by special lighting effects in the space [22] to segment human skin color regions. Instead of using a predefined skin color model, this model gets human skin colors from the input images themselves to renew its model more adaptively. Then, multiple features are applied to assess the genuineness of segmented skin areas and whether the input image is an adult image or not. The texture smoothness is especially considered in the extracted skin color areas. For effective learning of skin color distribution, the multilayer feedforward neural network [23, 24] is employed.

In addition to the methods described above, many other techniques regarding adult content detection have also been developed [14]. Although such methods may ensure some accuracy in some databases, they are not accurate enough to process every dynamic image taken in different environments.

3. Detection of Face Regions Using MCT

In this paper, to detect the facial area from input images, MCT features are used [25, 26]. MCT features are region-based features using (0, 1) binary information in the kernel. In other words, in the kernel, the mean is calculated and if the kernel value is larger than the mean, the value of 1 is assigned; if it is smaller than the mean, the value of 0 is assigned. Thus, in the kernel, a total of or 511 MCT features can be produced. In general, MCT features utilize regional information less sensitive to lighting changes and ensure a simple calculation process. Therefore, they deliver a high detection rate and quick processing time in face detection of multimedia area.

Normally, the MCT features using the kernel can be calculated using

In (1), represents the brightness of and is the average brightness of pixels in the kernel. represents the center of kernel and adjacent pixels. is a comparison function. If is larger than the average brightness, has a value of 1; if not, has a value of 0. is a decimal conversion operator. It changes the 9-digit binary number resulting from into a decimal number. Thus, the MCT features used in this paper range from 0 to 510. Figure 3 shows an example of MCT transformation.

The MCT features identified in the input image are applied to a face detection classifier generated by Adaboost learner [27] to primarily screen out a facial area. Within the detected facial area, the eye and lip areas are then extracted with EyeMap and LipMap [28, 29]. The EyeMap-based method is a method to use color space. It uses the channel-based EyeMapL and channel-based EyeMapC to perform the AND calculation and produce EyeMap. The LipMap-based method is to estimate the lip area colors based on the colors of overall skin area. It calculates , variable of overall skin area color, and uses this to generate LipMap.

Figure 4 shows the results of eye and lip area detection by applying the EypMap and LipMap to the face areas that is detected using the MCT features. Figure 4(a) is an example of eye area detection, and Figure 4(b) is an example of lip area detection.

The minimum enclosing rectangle including the detected eye and lip areas is selected as the final face region. In this paper, the face area, including extracted eye and lip regions, is used to effectively filter out candidate nipple areas to be extracted in the subsequent phase. In other words, because candidate nipple areas cannot exist inside the human face area, if such an area exists in the face region, the system regards it as a nonnipple area and removes it.

4. Extraction of Nipple Regions

In this paper, to assess the harmfulness of diverse input images, the existence of female nipple area in such images is determined. To this end, human skin areas are first detected using a predefined oval-shaped human skin color distribution model [28].

Then, as in (2), candidate nipple areas are extracted from skin regions by using the nipple map defined by the color model. In (2), , , and represent , , and color values in the corresponding coordinates. The nipple map is defined by utilizing all of the color elements of the color model. Each item value of (2) is normalized between 0 and 255:

Normally, human nipple areas have reddish color values and relatively darker values in terms of brightness. Based on this fact, the definition of nipple map, , was made. In (2), emphasizes the pixels with reddish color and lower brightness. stresses the nipple area more relatively to the skin area.

Table 1 shows the quantitative comparison of average and standard deviation of color distribution on skin area and nipple area. As shown in Table 1, the skin area, compared with the nipple area, has lower values on average and higher values.

In this paper, the nipple map images extracted by applying the nipple map to skin area are marked as brighter if their likelihood to be a nipple area is higher. The lower the chance of being a nipple area, the darker they are marked. Next, the extracted nipple map images are binarized and labeled to extract candidate nipple areas. The Otsu method is used to binarize nipple map images [30, 31]. This method is known to deliver the best performance when a brightness histogram has two types of probability density functions. The Otsu binarization method finds the best critical value for the brightness histogram binarization based on statistics without preliminary knowledge. It is one of the most frequently utilized binarization algorithms in image processing and computer vision fields.

After detecting candidate nipple areas by binarizing nipple map images, morphological operation is applied to remove relatively smaller areas, such as noise areas [32, 33]. Morphological image processing is a collection of nonlinear operations related to the shape or morphology of features in an image. Morphological operations rely only on the relative ordering of pixel values, not on their numerical values, and therefore are especially suited to the processing of binary images. In general, opening morphological operation conducts a dilation process after an erosion process. It removes areas smaller than a certain size and leaves other regions in similar sizes to their original sizes. The boundaries of the remaining areas become softer.

5. Determination of Harmfulness Using Neural Network

After the candidate nipple areas are extracted in the previous phase, the MCT features and artificial neural network are utilized to remove nonnipple areas and have only actual nipple regions.

In other words, this paper builds a learning database with nipple images normalized in the pixel size to learn nipple areas. Next, the MCT features are extracted from each nipple area and the set of extracted MCT feature is learned through the artificial neural network to generate a nipple classifier. Although MCT is much sensitive to rotation, such a restriction could be minimized as nipples are circular. Lastly, by using the generated nipple area learning classifier, candidate nipple areas are verified to make a final assessment on image harmfulness. That is, as in (3), if a female nipple area is viewed to be exposed, the system regards the input image harmful; if not, it views the input image as being not harmful:

In (3), represents the th nipple area included in an image and means an input image at the time .

This paper proposes an algorithm recognizing nipple areas after learning them by using a layered hierarchical artificial neural network. The learning function of the artificial neural network is the error backpropagation algorithm [34]. One hidden layer is also used. The activation function is the binary sigmoid function. The hierarchical artificial network consists of 511 input nodes, 128 hidden nodes, and 1 output node.

The hierarchical artificial neural network used in this paper inputs and relearns detected samples using a threshold value that produces 99% detection rate and 50% misdetection rate. It repeats this process in 6 layers. Candidate nipple areas are normalized into in this paper and they are tested by using the hierarchical artificial neural network. Figure 5 overviews the classification process of hierarchical artificial neural network.

The candidate nipple areas extracted in this process may include multiple nipple areas. Thus, if the size of candidate nipple areas defined as in (4) is relatively larger than the size of overall images, our experiment tries nipple detection again within the candidate regions to divide the corresponding candidate regions smaller:

6. Experimental Results

The computer used in this paper has an Intel Core i7 2.93 Ghz CPU and 8 GB memory. The operating system is Microsoft Window 7. The programming tool to realize the proposed harmfulness detection method is Microsoft Visual C++ and OpenCV. To compare and evaluate the performance of the proposed algorithm, diverse types of adult images and nonadult images were collected, which had been captured in normal outdoor and indoor environments without a specific constraint.

Figure 6(a) shows an adult image and Figure 6(b) shows the result of nipple map estimation from the input image.

Figures 7(a) and 7(b) display the examples of final nipple area detection based on the proposed method.

To qualitatively evaluate the performance of the proposed image harmfulness assessment method, an accuracy criterion was employed, which is defined in (5). used in these equations represents the number of accurately detected nipple areas. means the number of misdetected nipple areas that are actually nonnipple areas. is the number of nipple areas that are not detected. represents the relative ratio of nipple areas accurately detected in all of the detected nipple areas from an input image. represents the relative ratio of accurately detected nipple areas in the entire nipple areas that actually exist in a given image. In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance [35]:

In this paper, the proposed method was compared with the existing intensity filter-based method and geometric filtering-based method to test its accuracy. Figures 8 and 9 display the accuracy test results of the proposed algorithm through (5). As in Figures 8 and 9, the hierarchical artificial neural network-based method was found to reduce the misdetection rate, providing higher accuracy in image harmfulness assessment.

Of the three methods, the intensity filter-based method showed the lowest accuracy. Because color information was not sufficiently utilized, the method had many errors. The geometric filtering-based method detects candidate nipple areas first and then removes nonnipple areas using key geometric features. This method possibly causes misdetection because more specific filtering is difficult. The proposed method employs the hierarchical artificial neural network to learn the main features of human nipple areas in depth before detection, realizing a higher accuracy. However, if main parameters used in the suggested algorithm are not tuned sufficiently in the initialization procedure, the accuracy of image harmfulness detection may become somewhat low. In addition, when nipple areas are contained in sections of the input image where the picture quality is decreased, the proposed method may indicate a decrease in the precision rate of image harmfulness detection.

7. Conclusions

This paper proposed a new method of automatically assessing the harmfulness of input images based on an artificial neural network. The proposed method first detects a human face area in the input image by using the MCT features. Next, based on the color features, the method obtains human skin areas and extracts candidate nipple areas. Lastly, among the candidate nipple areas, nonnipple areas are removed using the hierarchical artificial neural network and actual nipple areas are robustly detected to finally assess image harmfulness.

In the experiment, the proposed algorithm was applied to diverse types of adult and nonadult images captured in usual indoor and outdoor environments without a specific constraint to test its performance. As a result, the proposed method using the hierarchical artificial neural network was found to provide more robust detection performance than other existing methods. In other words, the proposed method showed a higher accuracy as it first learns the features of nipple areas through the neural network and then detects nipple areas.

Future work of this study will include determining the harmfulness of various types of input images more effectively by dividing the harmfulness of the images into multiple levels instead of the present two statuses, harmful and not harmful. We will also attempt to tune the predefined parameters used in the suggested algorithm adaptively for improving the stability of the overall system. In addition, in assessing the image harmfulness, how to consider other human body parts representing harmfulness other than the nipple area, such as navels, hips, and genital regions, will also explored continuously.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2011-0021984).