Abstract

This paper incorporates principal component analysis (PCA) with support vector machine-particle swarm optimization (SVM-PSO) for developing real-time face recognition systems. The integrated scheme aims to adopt the SVM-PSO method to improve the validity of PCA based image recognition systems on dynamically visual perception. The face recognition for most human-robot interaction applications is accomplished by PCA based method because of its dimensionality reduction. However, PCA based systems are only suitable for processing the faces with the same face expressions and/or under the same view directions. Since the facial feature selection process can be considered as a problem of global combinatorial optimization in machine learning, the SVM-PSO is usually used as an optimal classifier of the system. In this paper, the PSO is used to implement a feature selection, and the SVMs serve as fitness functions of the PSO for classification problems. Experimental results demonstrate that the proposed method simplifies features effectively and obtains higher classification accuracy.

1. Introduction

There are numerous approaches that develop useful schemes to detect and recognize the features of human faces in recent years. They are used to filter the background and detect the faces blocks from a digital image firstly, then to determine their features and generate characteristic vectors, to localize the faces continuously, and to recognize the main face finally. In general, the approaches of human face image processing consist of two fields as face detection and face recognition. For human-robot interaction, to recognize faces is much more difficult than to detect faces. It is because the facial expression and features are changeable and not easily recognized and predicted.

There are two standard linear subspace projections of the low-resolution facial images [1], the PCA and the linear discriminant analysis (LDA) [2], used to distinguish the different styles of images. The PCA is basically a compression procedure based on linear projection techniques on a subspace spanned by the principal eigenvectors (those corresponding to the largest eigenvalues) of the input covariance matrix. The LDA approach was proposed by Fisher firstly which identifies directions in space along which separation of the projections is maximized. While the LDA is not always superior to the PCA in terms of recognition accuracy, the PCA + LDA approach has been successfully applied in some face recognition applications.

However, the PCA and/or LDA based facial recognition may fail if the facial samples are captured in different directions. While the face images are captured from different sight depths and directions, the accuracy of the PCA based recognition will be reduced heavily. There are many face recognition methods provided to overcome such problems, some focus on the feature extraction. How to use the smallest dimensions to replace the most representation’s feature and classification is the most important issue in this paper.

Since the facial feature selection process can be considered as a problem of global combinatorial optimization in machine learning, the PSO-SVM is usually used as an optimal classifier of the system. In the part of the classification, the PSO is used to implement feature selections, and the SVMs [3, 4] serve as fitness functions of the PSO for the classification problem. The advantage of the SVM in the multidimensional space is that it can quickly and correctly classify samples to find out the best support vector [5]. The PSO algorithm is a kind of imitation of birds clustering phenomenon algorithm [69], and in this paper, the PSO will be issued to correct the parameters of the SVM so that the image recognition process can be faster and more stable.

The rest of this paper is organized as follows: in Section 2, the image detection scheme is described which consists of smoothing filter, connected position, and ellipse detection. Section 3 describes the face recognition system which includes the adjustment of the dimensions, the PCA, and the PSO-SVM classifier. Section 4, experimental results are presented to demonstrate the feasibility of the proposed scheme. Finally, some conclusions are made in Section 5.

2. The Face Detection System

In order to detect the face from an image, there are several key steps necessarily processed. The first is to detect the area of skin color; the second is to reduce the noise; the others are to distinguish which one or ones are the face by ellipse detection and to separate the face block from the image background. Figure 1 illustrates the flowchart of the proposed face detection scheme. The images captured by the webcam in sequence will be sent to the face detection system firstly, and then the area of the human face is separated from the complex background by the skin-color detection. Next, the noise is going to be filtered by the noise reduction. Finally, the captured contours will aid in locating the position of the human face. The block of the human face will only remain in the image after the face detection.

2.1. Skin-Color Detection

The RGB image is always with chromatism because of changeable illumination in every capture. For skin-color detection, a reliable color range defined as the skin-color is principal. Since the color values in RBG are so sensitive to varying illumination, most approaches adopt the color model of YCbCr replacing the RGB because the value Y is relative to luminance and the values of Cb and Cr are relative to chroma.

YCbCr color space can be regarded as a modified YUV color model. However, YCbCr is not an absolute color space which is a way of encoding RGB information. In YCbCr color space, Y is the luma component; Cb and Cr represent the blueness and the redness chroma components, respectively. The transform defined in the ITU-R BT.601 standard for digital component video the YCbCr color space can be translated from the RGB color space by The resultant Y is just between 16 and 235 because the values from 0 to 15 are called footroom and the values from 236 to 255 are called headroom. Besides, in this paper, the skin-color region of Cb calculated is between 71 and 127; the skin-color region of Cr is between 130 and 170 from the 150 sample images. By the skin-color regions, the human face block will be easily distinguished from the image according to the values of Cb and Cr.

The low-pass filter (LPF) was used to eliminate the small noises and connect the part of the incomplete image. The LPF makes the image for filtering become more smooth and uniform. In general, the start of the mask is set to be at the top left pixel of the preprocess image, and it will scan the whole image from left to right. It is also referred to the neighboring 8 pixels for processing. Generally the sizes of the masks are , , and . The larger the mask, the larger the filtering effect, but the calculation becomes relatively large. The mask was used for low-pass filtering in this paper.

Besides, the opening operation of the morphology is usually used to eliminate noises in an image. The opening operation included two operands with erosion and dilation. First, it will use the erosion for the binary image and then use the dilation for its result. After this procedure, the noises will be removed from the image.

The opening operation of the morphology can not only remove a part of the noise pixels of the binary image but also make more complete region of the skin-color.

After finishing the procedure of the noise reduction, the connected component labeling (CCL) was used to find the location of the human face. This method was mainly used to find the connected pixels of the same object in the image. It marks each block by different labels and counts size, height, and width of each independent object in the image. The 4-connected was used to label pixels in this paper. It starts to scan the prelabeled binary image from top left. In the coordinates of the pixel, determine the presence of 255 on a pixel before checking the right , left , top , and bottom , for any other 255 values. If yes, it will record its coordinates and the pixel is set to 0. Then there is the recursive process of checking all the pixels for presence of 255 valued pixels until none is present. Then the group with the most number of 255 valued pixels is searched and labeled.

While finishing the recursive scanning of the image, the group objects for image labeling were calculated. The biggest area of the region was found from all numbered regions, that is, the region of the human face. This region was scanned to find the black background of the facial boundary. Then the relative coordinates (,) and (,) were employed, to capture the facial region from the original image. The biggest region for labeling of the binary image is shown in Figure 2.

The Sobel edge detection [10] is used to detect the edge. For face detection, since there are obvious differences between the background and the region of the skin color in the facial binary image, the edge detection usually employed the first derivative of the image to estimate the regional edge and was used to calculate the size of the gradient for image processing. The Sobel edge detection employs the gray scale difference at the position of the edge and weighted the top and bottom, or left and right, to detect the edge of the object. Through the Sobel edge detection processing, the region of the face is shown in Figure 3.

Since the size of a human face is similar to the elliptical model with the ratio of the vertical axis and the horizontal axis as approximately 1.2 : 1, thus, an ellipse mask is used to locate the human face, which marks a boundary to extract the edge of the image. After the edge detection, the region that resembles more the shape of an ellipse will provide the position of the face. The center of the ellipse with the use of its circumference and the length of its axis can help to determine its position, shape, and size. In order to meet the size of the facial change, the elliptical model must adjust the ratio of the length of the axis to scan the image in real time. Therefore, we designed an elliptical model that determines the coordinates of the center of the ellipse with as the radius of the horizontal axis and for that of the vertical axis.

Face detection is then finished after the ellipse detection processing. The ellipse detection cooperates with the connected component labeling to find the facial region. The locations of the detected faces in the original image and the binary image are shown in Figures 3 and 4, respectively. Next, the unnecessary region outside of the elliptical range was removed. This procedure will help in eliminating the image of the neck and locate the facial region. The relative coordinate was used to capture the face for the original image. The result of the captured facial image is shown in Figure 5.

3. The Face Recognition System

3.1. Description of the Face Recognition System

The face recognition processing will be executed after finishing face detection. However, since the dimensions of the facial images are not the same, the normalization process for the facial image becomes important. The processed image to be analyzed contains the same information of the environment to make the dimensions of the facial regions the same for each face. After the image normalization processing, the captured image serves as the input data for the face recognition system. And then, principal component analysis is utilized to calculate the feature. This method could reduce the dimension of the image and save computation time. It could surmount the problems of the changed expression or presence of glasses on the face, because it treats the whole face as a feature.

The weighting vectors of the processed facial image will be calculated by using PCA. All the weighting vectors are collected to build the database for the face recognition system. The user identity could be determined by the support vector machine that compares the current image with the image in the database. This way is a fast and accurate classifier that can be applied to classification and comparison applications. It shows that the velocity and accuracy of the face recognition system can be increased through the support vector machine processed. At the same time, the particle swarm optimization (PSO) is used to design the parameter of the support vector machine. This method makes the face recognition system complete and quick. The face recognition system flowchart proposed in this paper is as shown in Figure 6.

3.2. Image Normalization

Achieving high recognition rate for the face recognition system does not only need a good recognition algorithm but also needs a robust face image for the preprocessing of the image. It could reduce the difference for each input image and changed each image to the same dimension for the database. Therefore the bilinear interpolation is used to amend the image that detects the face with the size of the pixel at . Through the bilinear interpolation processing, feature extraction and recognition are executed for the facial image.

The relation of the pixel coordinate between the original image and final image is not the same as the general image processing when the image is zoomed. A pixel of the original image could project into many pixels for the final image, when the images are zoomed in. Similarly, many pixels of the original image could project onto a pixel for the final image, when the images are zoomed out. Therefore, the final image must search for a pixel to substitute the original image. And with this in mind, the interpolation process is needed to calculate the pixel of the final image; otherwise the final image would produce tremendous distortion. Actually the interpolation used discrete samplings to calculate the continuous function which passes through the coordinates of these samples and then employ this function to find the value of nonsamplings. The image zooming could define the signal resampling because the image was a signal that is composed of the two-dimensional discrete sampling. Then the bilinear interpolation adopted the four neighboring pixels to calculate the new pixel. The diagram of the bilinear interpolation is shown in Figure 7.

In order to obtain the pixel as a projection of the original image, is supposed to project onto ; thus the four neighboring pixels (A, B, C, D) were used to calculate the distances between and the coordinates of four pixels. If the four pixels are closer to then the contribution is large for . Conversely the influence is smaller if the distance is farther. Therefore the effect is inversely proportional to the distance. In fact, the bilinear interpolation calculated the linear interpolation continuously for three times. The first interpolation was calculated to be the influence between two points (A, B) and in order to obtain pixel E. The equation of the first interpolation is expressed in

And then the second interpolation was calculated using the influence between two points (C, D) and to obtain pixel F. The equation of the second interpolation is expressed in

Finally, the third interpolation was used to calculate the pixel of for two points (E, F). The equation of the third interpolation is expressed in where represents the relative horizontal distance of in relation to the four neighboring pixels; is the relative vertical distance of corresponding to the four neighboring pixels. If the distance from pixel to pixel is one unit as assumed in this paper, it can be represented by and , then the adjusted facial dimension is shown in Figure 8.

3.3. Principal Component Analysis

After normalization, if the face recognition is directly implemented it would cost a lot of computing time. This is due to the fact that all the information of the original image is spread in each pixel; hence, there is the need to reduce the dimensions of the image. And then, the suitable features are captured to express a lot of information in lower dimensions. It could reduce many variations for data with the use of PCA and applying some mutually independent linear combinations to substitute for the original data. Through the linear combination computing, the difference of the variation was the large influence of the data. This analysis made data to display the biggest individual differences. Below is the process for implementing PCA.

If there were facial images as the training samples the original feature parameters were . The objective of PCA was order to find the linear transformation matrix with a size of . It extracts feature parameter of the original dimension to transform the more representative of the feature parameter that the dimension was . The equation of the transformation expressed in

Before the transformation computing, the mean vector was . After the transformation, the mean vector was expressed in

Then the total scatter matrix was used to indicate the dispersion of all feature parameters that were opposite to the mean vector . Before the transformation computation, the total scatter matrix was which has a size of . The equation of the total scatter matrix is expressed in

Through (18) to (20), the total scatter matrix can be obtained having a size of after the transformation. The equation of the total scatter matrix is expressed in

In order to increase the dispersion between the mean value and the feature parameters of the transformation, the transformation matrix should be calculated that could make the maximization of . The equation of the transformation matrix is expressed in

According to the theory of linear algebra, the trace or the determinant can be used to express the element’s distribution. Therefore, (9) could be rewritten as

In (10), in order to limit the value of and avoid the occurrence of infinity, a limit condition as was added for the transformation matrix that has a size of :

As follows, the value of is maximum while the first derivative for is zero:

Then, (12) becomes (13) through simplification using transposition:

In (13), it needs to compute the eigenvectors of matrix by composing the matrix for . The eigenvalues and the eigenvectors are the special form in matrix algebra and its elements could be restructured in the matrix. In addition, the important information would be concentrated in the larger eigenvalue that would correspond to the eigenvectors.

The advantages of PCA in face recognition can be divided into three advantages. First, it could quickly and easily calculate the result. Second, PCA could retain the largest information of the projection data in the linear projection. Third, PCA used the whole face to do feature extraction that could overcome the presence of glasses and the changes of the facial expression. Below is the operational procedure of PCA.

After normalization, number of facial images was trained using PCA. The size of each sample was matrix. Next, each sample is rearranged as the augmented vector which has the size as shown in (14). represent the facial images processed.

Each facial sample corresponded to , and the mean vector was calculated by the amount of as expressed in The mean vector is the mean face which indicates the mutual parts of all face. And then the mutual parts for facial images were deleted to highlight the different parts between them. Therefore the different image vector of each image was obtained as shown in wherein matrix equals that had the size and the covariance matrix of all faces was defined as The eigenvalue and the eigenvector of the matrix are expressed in where and .

Since the size of matrix was , it makes the size of matrix be . For such a large matrix calculating the eigenvalues and eigenvectors is time consuming. Thus, if the dimensions of the matrix could be reduced, it could effectively save calculation time. Therefore, the matrix must be calculated first and the dimensions of the matrix must be reduced as to obtain the eigenvector which expressed in

Equation (18) multiplies by the matrix to obtain in which has the same eigenvalue and eigenvector with , because the matrix equals . By comparing (17) and (19), (20) can be obtained as follows:

By using (18), the matrix of is used to calculate the eigenvector which determines the eigenvalue . It is considered as an eigenface as expressed in

The vector of the individual facial image combined with the corresponding eigenvector to build the feature space. And we calculate the weight vector from the feature space as expressed in Finally, each training sample of the face is inputted to substitute the of (22) and calculate the eigenvector in the feature space. Through the computation, the matrix will be taken as the database of the facial images after the training.

3.4. SVM Based Classification for Face Recognition

Euclidean distance based methods [11] aim to calculate the difference value of the distance measurement, which are usually used in pattern recognition system. The resemblance computation directly calculates the difference between two vectors. The smaller value means that the two vectors are closer. It also indicates that the features of two images are also closer, and there is the presence of similarity in the images. The equation of the Euclidean distance is expressed in in which are the Euclidean distances between the eigenvector of each image and the eigenvector of the target image; is the th element of the input eigenvector; is the th element of the eigenvector saved in the database; is the dimension of the eigenvector; and is the th image saved in the database.

In general, if the Euclidean distance method is directly used in face recognition system, it would require a lot of computation time, because the Euclidean distance applies bubble sort for comparison. For example, if there are one thousand data in the database, it will require the process to be performed one thousand times and the larger the database the longer the comparison time. Therefore, support vector machine is used to assist such face recognition problems. The calculated Euclidean distances are inputted the feature space of the support vector machine to perform the comparison and classification.

The most important goal of face recognition system is how to raise the accuracy and shorten the computing time of the system. In fact, principal component analysis could indeed raise the accuracy as shown in previous experiments. However, the comparison of the Euclidean distance would require a lot of computing time. Therefore, the design of the support vector machine classification shortens the program’s computing time for the face recognition system. The SVM is performed through the following process. First, the hyperplane is designed to classify the Euclidean distance as shown in Figure 9.

In the figure, denotes the Euclidean distance between the target image and th image of the database. There are images in the database with . Besides, indicates the mean value of the Euclidean distance between the target image and all images in the database. The equation of is expressed in

In this method, the training data is shown: Then, consider in which is the normal vector of the hyperplane and is the deviation value. In order to find the division of the hyperplane the question of quadratic optimization needs to be resolved. The constraint is expressed in

The minimum value of must be determined because the equation above is quadratic with a linear constraint. This is a typical quadratic optimization problem. So, the Lagrange multiplier is resolved to the question of quadratic optimization with linear constraint to obtain

However, the support vector machine still does not produce the optimal solution. The method in which the problem is dealt with was to address the dual question. The dual question is expressed in

And a new equation was left after performing the substitution as expressed in

After having determined the optimal solution to the dual question, each Lagrange modulus is mapped onto each trained data. If , this means that the data is the support vector of this question and it is located in the margin separating the hyperplane. The final function is expressed in

By the method of support vector machine, found to be located between zero and in the hyperplane are retained. These data are used to recalculate the and reexecute the classification by the same way. This procedure would be repeated until data are all plotted before terminating the program. This data represents the Euclidean distance closest to the target image. Finally, this data shows the image which is the recognition results needed.

3.5. PSO-SVM Classifier

The particle swarm optimization (PSO) was proposed by [1215]. This method is a concept of swarm intelligence which belongs to the territory of the evolutionary search. This algorithm is an evolutionary optimization implementation similar to the genetic algorithm (GA). First, they could produce the initial solution and apply evolution to find the optimal solution. The difference is that PSO does not have the procedures of crossover and mutation. It belongs to the signal-channel messaging, and the process of searching update is changed according to the current optimal solution. Therefore, in the general optimal questions, the PSO converges to the optimal solution more quickly than the GA.

The origin of the PSO is from the concept of the predation on bird populations. Kennedy used this concept to research the solution of the optimal question, and this question is just like a bird which flies in space, called particle. There is a fitness function of the objective function mapping for all particles that moved in the space. In addition, each particle has the velocity to determine the direction and the distance of the movement. The particles fligh in the solution space by the individual successful experience and the trajectory of the best particle in the current population. In addition, each particle could search independently in the PSO space. When the individual particle found the optimization of the function, the best search variable will be recorded in the individual memory. Thus, each particle owns the best memory of the search variable for itself. It would change the next search direction by the individual best memory of the search variable, and this procedure is called the cognition-only model. Every search would compare the optimization extent between the individual best search variable and the best search variable of the population. This procedure would adjust the variable memory of the best function for the population. At the same time, each particle could change the search velocity of the particle for next time according to the best variable memory of the population, and this process was called the social-only model. Through the evolutionary computation, the PSO would calculate the optimal solution according to the best fitness value of the particles [16]. The flowchart of the PSO is shown in Figure 10.

In the space of the SVM, it requires the design of an important parameter . Therefore, PSO is applied to optimize this parameter. The particle’s position in the PSO space was used to substitute the parameter of the SVM space. Through the evolutionary computation, each particle would update the position and the parameter would be updated continually too. By this procedure, we could find the optimization value of the parameter .

The PSO produces the particles of the initial population randomly and through the evolutionary computation to find the optimal solution for the function. In each evolution, the particle would change the individual search direction by two search memories. The first search is the optimal individual variable memory and the other is the optimal variable memory of the population . After the computation, the PSO would calculate the optimal solution according to the optimal variable memory. Figure 11 shows the PSO search in a particular space.

Having a range of and , it is supposed that the coordinate of particle’s position was ; therefore the parameter of the SVM could be calculated by

If the set with particles is called the population in the th generation, it can be expressed in

The velocity vector and position vector of the th particle () in the th generation () are expressed in (34) and (35), respectively, as follows: in which the position of the th particle in the th generation is . Also, the processes of the PSO could be explained as follows.

Step 1. The initialization of the PSO was set to , . The number of the particles (), the number of the generation (), and four parameters of , , , and are given.

Step 2. The initial velocity and the initial position of particles are created.

Step 3. The fitness value of each particle in the th generation is calculated by using (36). is the fitness function which is expressed by the reciprocal computing time of the recognition system:

Step 4. The and for each particle were determined and the equation of is expressed in (37), and the equation of is expressed in (38) as follows: where, was the individual optimal fitness value form the starting to the current generation.

Step 5. The index of the particle with the highest fitness function is designed by

And then, and are determined by

in which is the position vector of the particle with the global optimal fitness value from the starting to the current generation.

Step 6. If , and then go to Step 10, otherwise go to Step 7.

Step 7. The velocity vector is updated for each particle by where is the current velocity vector of the th particle in the th generation. is the next velocity vector of the th particle in the th generation. and are two uniformly distributed random numbers in . and are the constant values that are set to 2. was the weight value which is defined by where and are, respectively, the minimum value and the maximum value of , and the is set to 0.4; the is set to 0.9.

Step 8. Then the position vector is updated for each particle by where is the current position vector of the th particle in the th generation. is the next position vector of the th particle in the th generation.

Step 9. Let and go to Step 3.

Step 10. The optimal position vector of the particle with the optimal fitness value is determined.

After the above procedures, the particle moves in the generation that would create the new parameter for the SVM. The computing time of the face recognition system is compared for each parameter , and the best in current generation is searched for. It projects the space of the PSO to be the optimal global solution for the next generation. Through this method, the best parameter of the SVM could be found to reduce the computing time of the face recognition system.

4. Experimental Results

4.1. Actual Experiments

In the experiments of the face recognition system, the facial database used contained ten people as training samples. Each person has ten facial images used as the input samples, and then there are one hundred test samples in the real-time face recognition system. Through the real-time detection, the current facial image is captured to be the test face. The PCA-SVM-PSO algorithm is used to execute the face recognition system. The Euclidean distance between the test face and the samples of the facial database would be classified, and then it would find the sample in the face database in which the Euclidean distance is closer to the test face to be the result. The experiments of the real-time face recognition system are shown in Figure 12.

4.2. Experimental Comparison

In the part of the comparison with the experiments, the main comparisons made are on the training time, computing time, and recognition rate for three ways. The first way is using PCA and Euclidean distance with the bubble sort to the face recognition system. The second is to apply the PCA and SVM based classification to the face recognition system. The third is to adopt PSO as a parameter adjusted scheme for SVM based classification. There are fifty samples and one hundred samples used in the analyses of experimental results. Table 1 expresses the comparison with the experiments with fifty samples. Table 2 summarizes the comparison with the experiments with one hundred samples, in which ED denotes the Euclidean distance with the bubble sort.

From the results in Tables 1 and 2, one can see that the proposed algorithm has really raised the recognition rate and reduced the computing time for the real-time face recognition system. For the method of combining PCA and Euclidean distance with bubble sort, when the number of samples is doubled from 50 to 100, the average computing time also increases almost linearly from 0.149 s to 0.296 s. It is noted that the time needed for the way of the PCA combined SVM-PSO only increases by 40% correspondingly. Based on such view, one can say that the proposed algorithm is clearly superior for large-sample-size cases. It also concludes that the proposed method is faster and more efficient than other common methods for face recognition.

5. Conclusions

A real-time face recognition system is designed by using a combination of PCA and hybrid biology algorithm face recognition system application and this method has effectively reduced the computing time. There is a time savings of 60% after doubling samples from 50 to 100 samples as compared to other methods. Furthermore, the SVM-PSO scheme is designed to speed up the recognition and also enhances the performance of the face recognition. In the future, the result of the face recognition system can be further developed in a chip of an embedded system.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is supported by the National Science Council, Taiwan, under Grant nos. NSC 102-2221-E-218-017 and NSC 100-2632-E-218-001-MY3.