Abstract

The semantic-based facial image-retrieval system is concerned with the process of retrieving facial images based on the semantic information of query images and database images. The image-retrieval systems discussed in the literature have some drawbacks that degrade the performance of facial image retrieval. To reduce the drawbacks in the existing techniques, we propose an efficient semantic-based facial image-retrieval (SFIR) system using APSO and squared Euclidian distance (SED). The proposed technique consists of three stages: feature extraction, optimization, and image retrieval. Initially, the features are extracted from the database images. Low-level features (shape, color, and texture) and high-level features (face, mouth, nose, left eye, and right eye) are the two features used in the feature-extraction process. In the second stage, a semantic gap between these features is reduced by a well-known adaptive particle swarm optimization (APSO) technique. Afterward, a squared Euclidian distance (SED) measure will be utilized to retrieve the face images that have less distance with the query image. The proposed semantic-based facial image-retrieval (SFIR) system with APSO-SED will be implemented in working platform of MATLAB, and the performance will be analyzed.

1. Introduction

Due to the popularity of digital devices and the rise of social network/photo sharing services, the availability of consumer photos is increasing [1, 2]. A large percentage of those photos contain human faces [3]. The importance and the sheer amount of human face photos make the manipulation (e.g., search and mining) of large-scale human face images an important research problem and enable many real-world applications [4]. Facial images have gained importance amongst digital images due to their use in various aspects of life, such as in airports, law enforcement applications, security systems, and automated surveillance applications [5]. Law enforcement and criminal investigation agencies typically maintain large image databases of human faces [6]. Such databases consist of faces of individuals who have either committed crimes or are suspected of having been involved in criminal activities in the past. Composite drawings are used in identifying a potential suspect from an image database [7]. Retrieval from these databases is performed in the context of the following activities: the matching of composite drawings, Bann file searching, and the ranking of a photo lineup [1].

The face-retrieval problem is concerned with retrieving facial images that are relevant to users’ requests from a collection of images. Retrieving is based on the visual contents and/or information associated with this facial image [8]. The retrieval system is expected to display the images in the database that match the sketch. Bann file searching is performed when a (suspected) criminal at hand does not disclose his/her legitimate identification information to enable a law enforcement/criminal investigator to retrieve the criminal’s past history [9]. Under such circumstances, the investigator visually scans the criminal’s face to extract certain features and uses them in performing the retrieval. In a ranking of a photo lineup, the person performing the retrieval provides a vague and often uncertain set of features of a face and expects the system to provide a ranking of faces in the database that match the feature descriptions [10]. The victim or an eye witness of a crime describes the facial features of the perpetrator of the crime to a forensic composite technician [11]. The human face is the most significant component of the human body that people use for face identification and verification and the recognition of others; therefore, facial images are the most common biometric characteristic used for human verification and identification [10].

The five major aspects of face study are the representation of faces, the detection of faces, the identification of faces, the analysis of facial expressions, and the classification of faces based on physical features [5]. Human attributes (e.g., gender, race, hairstyle, and hair color) are high-level semantic descriptions of a person. Using these human attributes, many researchers have achieved capable results in different applications such as face verification, identification, and keyword-based face-image retrieval [12, 13]. The retrieval task contains two types of target images. One is face images with the same identity as the query face [14]. The other is face images that have an appearance similar to the query face [7]. The description of a face given by people is almost always semantic in nature, using verbal terms such as “long face,” “thick lipped,” or “blonde haired.” Semantic face retrieval involves the retrieval of face images based on not the raw image content but the semantics of the facial features, such as the description of a person’s nose, eyes, lips, or chin. The semantic face-retrieval system is based on the semantic description of the face and then an image match of the composed image with the images in the database [15]. Face-recognition systems are in high demand in airports and other public places for automated surveillance applications [16]. Most of these abovementioned applications use the face as a hard biometric for the verification or identification of a person and consist primarily of the task of matching the actual image of a face with those stored in a database of face images [2]. However, apart from their use as a hard biometric, the “soft” traits of face modality are used to group people instead of uniquely identifying a person by his/her face. Face images have been used to identify a person’s ethnicity, gender, and age [6, 17]. A more interesting application that views the face as a soft biometric is in face-retrieval systems. In many law enforcement applications, the soft biometric [12] traits of the face have to be matched to retrieve the target face image from a dataset [18].

The outline of this paper is organized as follows. Section 2 provides brief descriptions of recent related research. Section 3 reveals the motivation of the work. Section 4 presents our proposed semantic-based facial image retrieval using APSO-SED. Section 5 shows evaluations of our experimental results and discussion, and Section 6 presents the paper’s conclusions.

Park and Jain [19] proposed a method to utilize demographic information and facial marks to improve face-image matching and retrieval performance. They developed an automatic facial mark-detection method that uses (1) the active appearance model for locating primary facial features (e.g., eyes, nose, and mouth), (2) Laplacian-of-Gaussian blob detection, and (3) morphological operators. The experimental results based on the FERET database (426 images of 213 subjects) and two mug-shot databases from the forensic domain (1225 images of 671 subjects and 10 000 images of 10 000 subjects, resp.) showed that the use of soft biometric traits was able to improve the face-recognition performance of a state-of-the-art commercial matcher. This method does not capture the semantic aspects of a face. Humans by nature tend to use the verbal description of the semantic features (high-level features) to describe what they are looking for, and they encounter difficulty using the language of low-level features. In our system, however, we propose a new face-image-retrieval technique with semantic face aspects.

Thang et al. [6] proposed a CBIR that involves the development of a content-based facial image-retrieval system based on the constrained independent component analysis. Originating from independent component analysis CICA was a source-separation technique that uses a priori constraints to extract the desired independent components (ICs) from data. By providing query images as the constraints to the CICA, the ICs that share similar probabilistic features with the queries from the database can be extracted. Then, these extracted ICs were used to evaluate the rank of each image according to the query. The experimental results of their CBIR system tested with different facial databases showed that their system improved the retrieval performance by using a compound query, although the retrieved images had lower accuracy. The accuracy of the image retrieval will be improved in our proposed system.

Iqbal et al. [20] proposed an image-retrieval technique for biometric security purposes, which was based on color, texture, and shape features controlled by fuzzy heuristics. The proposed approach was based on the three well-known algorithms: color histogram, texture, and moment invariants. The color histogram was used to extract the color features of an image using four components: red, green, blue, and intensity. The Gabor filter was used to extract the texture features, and the Hu moment invariant was used to extract the shape features of an image. The evaluation was carried out using the standard precision and recall measures, and the results were compared with the existing approaches. The presented results showed that the proposed approach produced better results than previous methods. The study used the color and shape features in the image retrieval, but the accuracy was low. The accuracy of the image retrieval will be improved in our proposed APSO system.

Alattab and Kareem [21] proposed a method for semantic feature selection using natural language concepts. The research strategized to bridge the semantic gap between low-level image features and high-level semantic concepts. The semantic features were also integrated directly with the Eigen faces and color histogram features for facial image searching and retrieval to enhance retrieval accuracy for the user. The Euclidean distance was used for feature classes’ integration and classification purposes. The proposed human facial image retrieval has been evaluated through several experiments using precision and recall methods. The results have indicated high accuracy, which was considered a significant improvement over low-level feature-based facial image-retrieval techniques. Although semantic features are used to retrieve images, there is a semantic gap in this technique. To fill this semantic gap in the existing technique, we propose an innovative method known as adaptive particle swarm optimization (APSO).

Wang et al. [22] proposed an effective Weak Label Regularized Local Coordinate Coding (WLRLCC) technique, which exploits the principle of local coordinate coding by learning sparse features and employs the idea of graph-based weak label regularization to enhance the weak labels of similar facial images. The authors proposed an efficient optimization algorithm to solve the WLRLCC problem. They conducted extensive empirical studies on several facial image web databases to evaluate the proposed WLRLCC algorithm from different aspects. The experimental results validated its efficacy. The authors used local coordinate coding, which is a complicated optimization technique. Hence, we propose an adaptive particle swarm optimization algorithm.

3. Problem Definition

The ability to search through images is becoming increasingly important as the number of available images rises drastically. However, it is impractical for humans to label every image available and dictate which images are similar to which other images. Content-based image retrieval (CBIR) is an image-retrieval technique that uses the visual content of an image to find and retrieve the required images from databases. Normally in CBIR, facial images are different from other images because facial images are complex, multidimensional, and similar in their overall configuration. Many techniques have been proposed in the field of CBIR with facial images from face databases. In the existing methods, image-retrieval systems primarily use low-level visual features such as color, texture, and shape. Visual features are extracted automatically using image-processing methods to represent the raw content of the image. Image retrieval based on color usually yields images with similar colors, and image retrieval based on shape yields images that clearly have the same shape [23]. Therefore, systems that are used for the general purpose of image retrieval using low-level features are not effective with facial images, particularly when the user’s query is a verbal description. Such descriptions do not capture the semantic aspects of a face. Humans by nature tend to use the verbal description of semantic features (high-level features) to describe what they are looking for, and they encounter difficulty using the language of low-level features. One recently developed CBIR system [19] for biometric security is applied on facial images with low-level features. This method acquired approximately 95% accurate face-image retrieval, but this method does not apply to real-time data.

Moreover, the CBIR based on low-level features does not produce accurate results in face images with the same people at different ages. Hence, the existing systems present some drawbacks in facial image retrieval when the facial images are from different ages; that is, they are not considered in the semantic features during retrieval.

4. The Proposed Semantic-Based Facial Image Retrieval Using APSO and Squared Euclidian Distance (SED)

We propose an efficient semantic-based facial image-retrieval (SFIR) system using APSO and squared Euclidian distance (SED). This technique retrieves more related images by using the APSO algorithm for the query images. The architecture of our proposed method is shown in Figure 1.

Our proposed semantic-based facial image retrieval is comprised of three stages:(i)Feature extraction:(a)Low-level feature:(1)Shape (canny edge detection).(2)Texture (local binary pattern).(3)Color (color histogram equalization).(b)High-level feature:(1)Semantic features (face, mouth, nose, and left and right eyes).(ii)Optimization:(a)Adaptive particle swarm optimization.(iii)Image retrieval:(a)Squared Euclidian distance.

We considered image database , where is the total number of images in database . Similarly, the query image database is defined as . represents the total number of query images in query database . Hence,Initially, the different-age face images from a larger number of persons will be collected and stored in the face database. Let be a face database containing face images and let be query database images. After the face images are collected, the initial stage is the feature extraction. In the feature-extraction stage, low-level features such as the shape, texture, and color and high-level features such as the face, mouth, nose, left eye, and right eye will be extracted from the database images and query database images. Afterward, based on the features obtained, the different-age face images will be classified by generating rules, and semantic labeling will be created for those face images. To obtain better image-retrieval results, a semantic gap between these database image features and query database image features will be reduced by a well-known optimization method known as adaptive particle swarm optimization (APSO). Thus, we obtain a reduced set of features. Afterward, a squared Euclidian distance (SED) measure will be utilized to retrieve the face images that have a smaller distance from the query image.

4.1. Feature Extraction

(i) Low-Level Features:(1) Shape (canny edge detection).(2) Texture (local binary pattern).(3) Color (color histogram equalization).

4.1.1. Shape Feature Extraction Using Canny Edge-Detection Technique

This method effectively detects the edges of the given input face image and the query image . The edges of the face images are detected using the canny edge-detection algorithm. The process of the edge-detection algorithm is given below in detail.

Step 1 (smoothing). Any noises present in the image can be filtered out by the smoothing process with the help of a Gaussian filter. The Gaussian mask is slid over the image to manipulate one square of pixels at a time and is smaller than the actual image size.

Step 2 (finding gradients). From the smoothed image, the gradients are found using the Sobel operator. In the Sobel operator, a pair of 3 × 3 convolution masks is utilized to find the gradients in - and -axis directions. The gradients in the -axis direction are given inThe gradients in -axis direction are given inThe magnitude of the gradient is approximated using the following equation with the use of (2) and (3), which is also called the edge strength of the gradient:The edges are difficult to find; their location and the directions of the edges can be found using

Step 3 (nonmaximum suppression). To obtain the sharp edges from the blurred edges, this step is performed by considering only the local maxima of the gradients. The gradient direction is to be rounded nearest to 45° by its corresponding 8-connected neighborhood. The magnitude of the current pixel is compared with the magnitude of the pixel in the positive and negative gradient directions. If the magnitude of the current pixel is the greatest value means, then only that pixel magnitude’s value is preserved; otherwise, the particular pixel magnitude value is suppressed.

Step 4 (thresholding). The edge pixel that remains after the nonmaximum suppression process is subjected to the thresholding process by choosing the thresholds to find the only true edges in the image. The edge pixels that are stronger than the high threshold are considered strong edge pixels, and the edge pixels that are weaker than the low threshold are suppressed. The edge pixels between these two thresholds are taken as weak edge pixels.

Step 5 (edge tracking). Edge pixels are divided into connected BLOBs (Binary Large Objects) using an 8-connected neighborhood. The BLOBs that have at least one of the strong edge pixels are preserved, and the other BLOBs without strong edge pixels are suppressed.

Thus, from the face image , we obtain the shape features of an image , .

4.1.2. Texture Feature Extraction Using LBP

(1) Local Binary Pattern. The LBP originally appeared as a generic descriptor. The operator assigns a label to each pixel of an image by thresholding a 3 × 3 neighborhood with the center pixel value and considering the result as a binary number. The resulting binary values are read clockwise starting from the top-left neighbor, as shown in Figure 2.

Given a pixel position , LBP is defined as an ordered set of binary comparisons of pixel intensities between the central pixel and its surrounding pixels. The resulting 8 bits can be expressed as follows:where corresponds to the grey value of the central pixel and corresponds to the grey value of the surrounding pixels.

Thus, using the local binary pattern texture feature , has been extracted from the face image .

4.1.3. Color Feature Extraction Using Histogram

(1) Color Features. Color is one of the most widely used visual features in image retrieval [24]. Here, color features and are extracted by applying a histogram on the query image and database image . A color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each fixed list of color ranges that span the image’s color space in the set of all possible colors. The histogram provides compact summarization of the distribution of data in an image. The color histogram is a statistic that can be viewed as an approximation of an underlying continuous distribution of color values.

(2) Color Histogram. Color histogram is a -dimensional vector such that each component represents the relative number of pixels of color in the image, that is, the fraction of pixels that are most similar to the corresponding color. To build the color histogram, the image colors should be transformed to an appropriate color space and quantized according to a particular codebook of size :In (7), being the total number of gray levels in the image, is the number of occurrences of gray level . is the total number of pixels in the image, and is the image histogram for pixel value .

Computationally, the color histogram is formed by discrediting the colors within an image and counting the number of pixels of each color. The color descriptors of video can be global and local. Global descriptors specify the overall color content of the image but contain no information about the spatial distribution of these colors. Local descriptors relate to particular image regions and, in conjunction with the geometric properties of the latter, also describe the spatial arrangement of the colors. By comparing the histogram signatures of two images and matching the color content of one image with another, the color features and can be extracted.

(ii) High-Level Features. The most natural way people describe a face is by semantically describing facial features. Semantic face retrieval refers to retrieval of face images based on not the raw image content but the semantics of the facial features, such as the description features of the face, mouth, nose, and left and right eyes. Here, the high-level semantic features are extracted based on the pixel intensity value, which ranges according to the image taken. Here, we use the Viola-Jones algorithm for the extraction of high-level features.

The Viola-Jones algorithm utilizes a multiscale, multistage classifier that functions on image intensity information. Generally, this approach scrolls a window across the image and assigns a binary classifier that discerns between features of the face, mouth, nose, and left and right eyes and the background. This classifier is disciplined with a boosting machine learning meta-algorithm. It is credited as the fastest and most accurate method for faces in monocular grey-level images. Viola and Jones developed a real-time face detector comprised of a cascade of classifiers trained by AdaBoost. Each classifier exercises an integral image filter, which is reminiscent of Haar basis functions and can be processed very fast at any location and scale. This is essential to speed up the detector. At every level in the cascade, a subset of features is chosen using a feature selection procedure based on AdaBoost.

The method operates on so-called integral images: Each image element contains the sum of all pixels values to its upper left, allowing for the constant-time summation of arbitrary rectangular areas.

Integral Image. For the original image , the integral image is defined as follows:Using the following pair recurrence,(where is the cumulative row sum, plus ), the integral image can be processed in one pass over the original image. Using the integral image, any rectangular sum can be computed in four array references (see Figure 3).

The value of the integral image at location 1 is the sum of the pixels in rectangle . The value at location 2 is , at location 3 is , and at location 4 is . The sum within can be computed as . Viola-Jones’ modified AdaBoost algorithm is presented in pseudocode [22] below.

Viola-Jones’ algorithm is as follows:(i)Given sample images , where for negative and positive examples, carry out the following.(ii)Initialize weights , for , where and are the numbers of positive and negative examples.(iii)For ,(1)normalize the weights, ;(2)choose the best weak classifier in regard to the weighted error(3)define , where , , and are the minimizers of ;(4)update the weights:where if example is classified correctly and otherwise and .(iv)The final strong classifier iswhere .

Thus, the semantic features of the database face images , , , , and and query images , , , , and are extracted based on the pixel intensity value.

4.2. Optimization

To obtain better image-retrieval results, the feature set of the query dataset will be reduced by the well-known optimization method of the computer-vision platform, adaptive particle swarm optimization (APSO). Particle swarm optimization (PSO) is a population-based search algorithm. It is created to imitate the behavior of birds in search for food on a cornfield or a fish school. The method can efficiently find optimal or near-optimal solutions in large search spaces.

Each individual particle has a randomly initialized position , where is its position in the th dimension. The velocity , where is the velocity in the th dimension, , where is the best position in the th dimension, and , where is the global best position in the th dimension in the -dimensional search space. Any particle can move in the direction of its personal best position to its best global position in the course of each generation. The moving process of a swarm particle in the search space is described asIn (13),, are constants with the value of 2.0;, are independent random numbers generated in the range ; is the velocity of the th particle; is the current position of particle ; is the best fitness value of the particle at the current iteration; is the best fitness value in the swarm.

The adaptive particle swarm optimization (APSO) method is used here. It provides more accurate results. The APSO acceleration coefficients are determined bywhere , are minimum and maximum values of , , , are minimum, average, and maximum fitness value of the particles, and , are minimum and maximum values of .

4.2.1. APSO in terms of Parameter Optimization Phases

Figure 4 shows the structural design of adaptive particle swarm optimization to identify the reduced set of features for image retrieval. The steps for parameter optimization are as follows:(i)Generate the particle randomly: for population size , generate the particles randomly.(ii)Describe the fitness function: choose the fitness function, which should be used for the constraints according to the current population. Here, (16) is used to calculate fitness function:(iii)Initialize and : initially, the fitness value is calculated for each particle and set as the Pbest value of each particle. The best Pbest value is selected as the value.(iv)Calculations of the acceleration factors: the acceleration factors are computed using (15).(v)Velocity computation: the new velocity is calculated using the equation below. Substituting and values in the velocity equation (8), the equation becomes(vi)Swarm update: calculate the fitness function again and update and values. If the new value is better than the previous one, replace the old with the current one and select the best as .(vii)Criterion to stop: continue until the solution is good enough or a maximum iteration is reached.

Thus, we obtained a reduced set of features from the APSO technique. This reduced set of features will be then subjected to image retrieval using the squared minimum distance calculation.

4.3. Image Retrieval
4.3.1. Distance Measurement

Distance measurement is an important stage in image retrieval. A query image is given to a system that retrieves a similar image from the image database. Our main objective is to retrieve images by measuring the distance between the reduced set of query image features that is obtained from the optimization process and the image features in the database. To retrieve the image, a squared minimum distance measure is computed. The process of the squared minimum distance measure is described below:Here, are the database images and are the query images.

By exploiting (18), the relevant images are extracted, which have the minimum value of SED. By following the aforementioned process, images similar to the query images are successfully retrieved from the image database.

5. Experimental Results

The proposed facial image-retrieval technique using APSO is implemented in the working platform of MATLAB (version 7.13) with a machine configuration as follows:Processor: Intel Core i3.OS: Windows 7/XP.CPU speed: 3.20 GHz.RAM: 4 GB.

The proposed facial image-retrieval technique with the aid of APSO is analyzed using the Adience dataset [25], the IMM face dataset [26], and the locally collected facial dataset. The facial database consists of a total of 961 images, which include children, middle-aged people, and old people. For training, 313 images were used; 641 images were used for the testing process.

5.1. Performance Analysis

The performance of the proposed technique is compared with PSO and GA algorithms to show the efficiency of the proposed technique. The performance is evaluated by three quantitative performance metrics:(i)-measure.(ii)Precision.(iii)Recall.

5.1.1. Precision and Recall Values

Precision is the fraction of retrieved images that are relevant to the findings, and recall in retrieval is the fraction of images relevant to the query that are successfully retrieved.

The precision and recall values are calculated using the following equations:(i) denotes the number of relevant images retrieved.(ii) represents the total of retrieved images.(iii) is the total number of images in the database.

5.1.2. -Measure

A measure that combines precision and recall is the traditional -measure: Figures 5, 6, and 7 represent the sample input training images of children, old people, and middle-aged people, respectively.

From the input images, high-level features and low-level features are extracted, and the extracted features are given to APSO to select the optimal features. After that, the Euclidean distance is calculated between the features of the query image and the best features. Finally, the images that are similar to the distance are retrieved.

Figure 8(a) shows the query images of children, and the corresponding retrieved images are given in Figure 8(b).

Table 1 shows the performance measures such as precision, recall, and -measures, calculated using the statistical measures given in [27] for children.

Precision is one of the important performance measures in the image-retrieval process. Table 1 shows that the precision rate of the proposed technique is higher than that of the other techniques. Similarly, the -measure is higher than the other techniques. Although the difference is small, it indicates the improvement in the performance of the proposed technique.

Figure 9(a) shows the query images of middle-aged people, and the corresponding retrieved images are given in Figure 9(b). In Table 2, performance measures such as precision, recall, and -measure are given for the proposed technique, PSO, and GA in middle-aged people image retrieval. The performance measures are higher than the other techniques, except for recall, which implies that the proposed technique performs better than the other techniques.

Figure 10(a) shows the query images of old people, and the corresponding retrieved images are given in Figure 10(b). In Table 3, the performance of the proposed technique is analyzed in old people image retrieval using precision, recall, and -measure, and these performance measures are compared with other techniques such as PSO and GA. Compared to the other techniques (PSO and GA), the proposed technique has a higher rate of precision and -measure.

Discussion. The average rate for all measures is calculated from Tables 1, 2, and 3, and a graph is drawn and given in Figure 11. The precision rate of the proposed technique is 0.9684, and the other techniques, such as PSO and GA, have precision rates of 0.9399 and 0.9248, respectively. The proposed technique has a higher precision rate than other techniques, such as PSO and GA. The variation in the precision rate between the proposed technique and the other techniques is 0.0285–0.436. Furthermore, the performance of the proposed technique is evaluated using the -measure. The -measures of PSO and GA are 0.4872 and 0.4821, respectively. The -measure of the proposed technique is 0.4921, which is 0.0490 to 0.0101 greater than the other techniques. Therefore, the proposed APSO-based facial image-retrieval technique performs better than other techniques.

6. Conclusions

In this paper, we propose an efficient semantic-based facial image-retrieval (SFIR) system using APSO and the squared Euclidian distance (SED) method to reduce the shortcomings of the existing methods. Moreover, in comparative analysis, our proposed technique performance is compared with existing methods. The proposed technique has a higher precision rate and -measure than other techniques, such as PSO and GA. The comparison result shows that our proposed APSO-based semantic-based facial image-retrieval (SFIR) technique retrieved the images more accurately than existing methods. Hence, it is shown that our proposed semantic-based facial image-retrieval (SFIR) system using the APSO technique more precisely and efficiently retrieves images by achieving a higher retrieval rate.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.