Abstract

With the rapid development of computer science and Internet technology, face recognition technology is widely used in such as public security, judicial and criminal investigation, public security, information security and access control system, such as public security system need to find out the criminals in the system library, or from the entrance control system quickly identify and match the identity of relevant personnel information. As a stable, intuitive and highly recognizable biometric feature, human face is being paid more and more attention by researchers. Compared with other bioinformation recognition methods, face recognition is characterized by direct, friendliness and convenience. Users are not easy to resist, and compared with other recognition technologies, they are easy to be accepted by users, so it has received attention and research. This paper designs a face recognition system based on a convolutional neural network. Compared with the traditional face recognition method, the convolutional neural network model does not need manual complex and time-consuming feature extraction algorithm design, only need to design an effective neural network model, and then end-to-end training on a large number of training samples for simple and efficient training, can get a good identification effect. The design uses the target detection algorithm to conduct accurate, real-time, efficient face recognition, and can accurately identify the people in the camera.

1. Introduction

1.1. Research Background

The narrow sense of facial recognition technology refers to a computer vision technology that performs authentication or search by analyzing and comparing facial visual feature information. Some of the more common recognition technologies in daily life, such as fingerprint recognition, belong to the same field and the field of biological information recognition, and facial recognition technology is also widely used in daily life. This authentication method has many advantages compared with the traditional identity authentication. Password, card, certificate as the featured authentication technology, biometric technology is difficult to forge, and there will not be lost, biometric technology has become the most secure and the most reliable authentication technology in the world.

In the information age, information protection is particularly important, and China attaches great importance to public security, and the demand for security protection application of new technologies and new products is strong. Face recognition technology uses the difference of individual facial characteristics to realize human identity characteristics recognition. Face is the most direct performance and the most unique biological feature in biology, which makes face recognition technology have unique recognition advantages. Based on this advantage, face recognition has shown great potential in the application of many fields, especially to the public security investigation system [1].

In today’s Internet + environment, O2O, P2P, B2C, B2B and other forms of network business model gradually emerging, and in various network business model the most important key technology is network payment, network payment security problem determines the development of business model, so safe, fast network payment is the urgent need of modern society, face recognition is easy to forget, theft, and convenient and fast, security has become a new way of network payment [2].

Face recognition technology as the security of the city, realize the convenient life core technology, with the construction of intelligent information city in China, some high-tech company investment and strategic transfer, make the application of face recognition technology more and more wide, brush face payment, brush face clock function application, can say the new era of face recognition application has come.

1.2. Research Meaning

Face recognition has some characteristics that other recognition methods do not have, making it has unique advantages in specific applications [3, 4]:

1.2.1. Non-Contact, Imperceptible

Face recognition is generally carried out under conditions with visible light or infrared interaction with visible light and if the face is exposed to light, it is difficult to camouflage and change. Therefore, face recognition can easily access to identity information but cannot be detected and disgusted. This can play a great role in the arrest process of criminal suspects, because criminal suspects generally have a high vigilance, it is difficult to gain their trust, get their identity information. Face recognition does not require direct contact between people and the device, and other methods such as fingerprint recognition require contact with the device. On the one hand, it is not easy to be noticed that the collection can be completed inadvertently, without too much intervention from the staff. The collection person will not resist, because the face compared with fingerprint, iris and other characteristics are exposed characteristics, and we can accept it. On the other hand, it is clean and hygienic, and will not spread infectious diseases. During the epidemic prevention and control period, non-contact even reflects the unique advantages that other identification methods do not have [5, 6].

1.2.2. Concurrency, Equipment Requirements are Single

Face recognition for equipment requirements are mainly high pixel camera, the equipment requirements of a single. With the improvement of smartphone front camera pixels, more and more smartphone brands regard human recognition as an important way to unlock it. With people’s increasing requirements for urban security, the urban video surveillance system is increasingly perfect, and the number of high-performance cameras is increasing greatly. Compared with fingerprint recognition, iris recognition and other recognition devices, face recognition has obvious hardware basic advantages. And the price of face recognition equipment has significant advantages compared with the price of other recognition devices, and the price of face recognition equipment belongs to the acceptable range of users [7]. At the same time, with the improvement of the monitoring system, it can also be used to find the lost property and accurately crack down on the crime of population trafficking and other functions, because the surveillance can save the video and restore the scene at that time, which is impossible for other recognition methods. With the increasing accuracy of face recognition, face recognition will become a large-scale social recognition technology [8].

1.2.3. Natural Advantage

The advantage of nature is simply being able to distinguish with human eyes, rather than having to be processed by the device to distinguish [9]. Face recognition is very simple, just a few seconds after standing in front of the camera, users do not need to be trained, and do not need to carry documents. It also improves security. Fingerprint and iris recognition humans cannot be distinguished by the naked eye and must be scanned through devices, so they are not natural. Criminals have the opportunity to cheat devices by collecting fingerprints and iris to commit crimes [10, 11].

1.3. Development Trends

Stage 1: The initial stage of face recognition technology. In the initial stage of face recognition technology to Parke, mainly on the face gray scale graph model research, at this stage has no key progress, basic in practical application is with artificial to identify, identification efficiency is very low, recognition effect is poor, it is easy to identify error, and manual operation is too large, the system has no automatic recognition ability [12, 13].Stage 2: The transition stage of face recognition technology. The excessive stage of face recognition technology has achieved many very important results, pointing out the direction for face recognition technology. After going through the initial stage, the researchers study face recognition as one of the pattern recognition problems. The geometric structure features of the human face, such as the relative position and distance of the nose, eyes and mouth, constitute the feature vectors, which are matched for face recognition. The main characters are AJ Goldstein [5], LD Harmon and AB Lesk use geometric feature parameters to represent the front image of the face; Kaya uses statistical methods to study the Euclidean distance as the face features; Kanade [6] has designed a semi-automatic backtracking recognition system. The Turk of MIT and the characteristic face method proposed by Pentland are of great significance, and they are the most representative method during this period. Face Recognition technology Test has created its own FERET face image library, using it to evaluate various algorithms and look for algorithm flaws and improvements [14].Stage 3: Face recognition technology gradually mature stage. With the improvement of computer computing power, the concept of artificial intelligence and machine learning, face recognition technology has had a new development. At this stage, automatic face recognition has been available. At this stage, the research focus has gradually shifted to face recognition under non-ideal conditions. The current face system will still be affected by expression changes, posture changes, light conditions. The main characters include face recognition based on light cone model under multi-pose and multi-illumination conditions proposed by Georghiades, face image recognition based on multi-pose and 3D deformation model under multi-illumination conditions proposed by Blanz, and face recognition method for statistical learning of support vector machine SVM.

2. The Theoretical Basis of the Convolutional Neural Networks

2.1. Artificial Neural Nets

Artificial neural network is a mathematical model that mimics the coping mechanism of the human neural system and the process of external excitation, according to the principle of various information, as shown in Figure 1. This mathematical model has some features similar to biological systems, such as nonlinearity, robustness, non-limitation, very qualitative, fault tolerance, and fuzzy information that can be processed. Therefore, this model has achieved good results in pattern recognition, image processing and other aspects. Simply put, the artificial neural network first provides some data with the input-output mutual response relationship, and then analyzes the relationship between the two, to get the law, and to predict the corresponding results of the new input data according to this law. Convolutional neural network belongs to a kind of artificial neural network, so mastering artificial neural network is beneficial to understand convolutional neural network. Figure 1 shows a cerebellar model arithmetic computer.

2.1.1. Loss Function

The loss function [9] is used to measure how well the model predicts results at a time. Using the input and output of a set of data to adjust the parameters in the model to achieve the required performance, the process is supervised learning. Take it as an example, and arbitrarily choose a model that corresponds to a function, and for any input, the model will output a corresponding output result. The output may be the same as the true value, but there may be some error. At this point, a function is needed to describe the error of the model output, which is known as a loss function.

2.1.2. Gradient Descent

In order to reduce the error between the output of the model and the real value, the minimum value of the loss function is actually found. In practice, we generally iterate through gradient descent to find the minimum value of the loss function, a function image shown in Figure 2.

Assuming that the vertical axis of the function curve is the loss function value of the model, and the horizontal axis is a certain parameter value of the model, the loss function of the best predicted model that we want to obtain must be minimal. As can be seen from Figure 2, the image has two stagnation points, respectively, the maxima and minima of the function, but the minimum value of the function is only one. For machine learning models, the vast majority of them cannot directly obtain the minimum value through calculus, so we can only try to obtain the global minimum value through the step-by-step iteration method.

For a unary function, you can find the minimum value of the function method can be divided into several steps, each time along the derivative is negative way. For multivariate functions with multiple independent variables, it can be gradually found gradually along the direction of its gradient descent, which is the gradient descent algorithm. To get the minimum value of the loss function, you need to get the gradient of the loss function, and then update it in the direction where the gradient is negative, where parameter is known as the learning rate, the gradient descent convergence case at different learning rates is shown in Figure 3.

It can be seen that either too large or too small of the learning rate setting will affect the convergence effect, so a suitable learning rate is very important for the convergence of the model.

2.1.3. Optimizer

The learning rate parameter has a great impact on the quality of the trained model, but this parameter is not easy to set, either too large or too small. Therefore, one hopes that the model can both converge quickly and get a good result during the training process. Some scholars have proposed an optimizer that can automatically adjust the learning rate. By introducing the concept of momentum to adjust the speed of gradient descent, so that it can accelerate the decline when it should fall faster, and then converge quickly. There are many such optimizers, such as RMSprop, Adagrad, Adadelta, Adam, etc. This design uses the Adam optimizer, which mainly has significant advantages:(i)Easy to use, fast operation speed(ii)Memory footprint is not high, and the computer configuration requirements are low(iii)Suitable for the unstable objective functions.

In general, the Adam optimizer is highly efficient, simple to call to, and it is perfect for this design.

2.1.4. Error Backpropagation Algorithm

The error backpropagation algorithm in the 1980s is one of the most widely influential, Rumhart et al. the core content of the error backpropagation algorithm [11] is to divide the learning process into forward propagation process and backpropagation process. The forward propagation process is the input sample from the input layer, and the output layer from the hidden layer. If the actual output of the output layer does not match the desired output, the backpropagation process enters the error. Error backpropagation is to input the output error to the input layer through the hidden layer in some form, and the main purpose error is apportioned to all the units in each layer to obtain the error signal of each layer and correct the weights of each unit.

2.2. Convolutional Neural Network

Convolutional neural network is a feedforward neural network with deep structure composed of many layers. As shown in Figure 4, a typical convolutional neural network architecture is seen. The structure of convolutional neural network (CNN) includes convolutional layer, pooling layer, rectifier linear unit (ReLU), and fully connected layer.

2.2.1. Convolutional Layer

Convolutional layer [12] is the main building module used by convolutional networks, which performs most computationally heavy work. The main work of the convolutional layer is to extract the feature [13] from the input data of the image. Convolution preserves the spatial relationships between pixels by learning image features using small squares of the input image. Input images were convolved by using a set of learnable neurons. A feature graph or an activation graph is generated in the output image, which is then input to the next convolutional layer as input data. Layers of deep convolution is designed to extract the information of the various dimensions of the image, as shown in Figure 5, and the different features in the images are obtained through multiple extraction.

2.2.2. The ReLU Activation Function Layer

ReLU [14] is a nonlinear function, also known as a modified linear unit, and is a common activation function commonly used in artificial neural networks, as shown in Figures 6. This means that the operation will be applied to each pixel, reconstructing all negative values in the feature graph to zero. To understand how ReLU works, we assume that there exists a neuronal input with an input of x, whose function is defined as:

As shown in the schematic diagram of the neuron model in Figure 7, from formula , the function of using the ReLU activation function is to change the original linear output to the nonlinear output, and in practical industrial applications, more often, there are various nonlinear distributions.

The activation function also has the Sigmoid function and the Tanh function, as shown in Figures 8 and 9. The output of these two functions when x tends to infinity is 0 and 1, −1 and 1, respectively, but finding the gradient requires the first partial derivative of the function. When the value of the function is constant, the partial derivative of the function is 0, and the gradient does not exist, which is called gradient vanishing. Eventually the weight , and the bias b cannot be updated. However, when the ReLU function approaches 0 at x, the derivative of the ReLU function exists with a value of 1, which can reduce the computational amount of backpropagation. The gradient explosion problem occurs in the Sigmoid function, because the Sigmoid function is an exponential function, causing the problem of too large data in the result.

2.2.3. Pooling Layer

The function of the pooling layer is to reduce the size of the matrix, and can reduce the amount of operation in subsequent operations. The pooling layer reduces the broadband and height of the matrix and does not change the depth of the matrix to extract the main features of the matrix. However, there is no denying that after the pooling layer, the characteristics of the matrix will be lost. Pooling layers generally subsample each region through nonlinear operations such as average pooling or maximum pooling. Thus achieving better generalization, faster convergence, and better robustness to translation and distortion. The pooling layer is usually located behind the convolution layer. The process of maximum pooling is shown in Figure 10.

2.2.4. Fully Connected Layer

The convolution layer is the extraction of local features, the full connection is the extracted local features reintegrated through the weight matrix into a new graph, because all the extracted local features are called the full connection layer (Figure 11).

After layers of convolution and pooling, to extract the local features in the picture, and the first connected layer has activated a part of the neurons, the role of the fully connected layer is to integrate the relevant output to the second fully connected layer of some neurons, through the combination we can know that these features integrated is a cat.

3. Design of the Face Recognition System

3.1. Overall Flow of the System

The convolutional network-based face recognition system is designed to detect the faces from the camera display images and compare them with the previously trained faces in the database to determine whether they belong to be the same person.(i)Photo collection: complete video image acquisition from the camera.(ii)Face detection: detect whether there is a face in the image, and send the picture input to the image for pre-processing. If there is no face, return to the collection stage.(iii)Image pre-processing: face correction and cutting of face pictures.(iv)Image feature extraction: face feature extraction through convolutional neural network.(v)Feature matching: the extracted feature vector is compared with the feature vector of the face pictures in the library to get the judgment results.

3.2. Introduction of the Development Language and Library

Python is a fully object-oriented language. The most obvious feature that distinguishes the Python language from other languages is its simplicity. Python is the most concise of all programming languages, although it is also easy for beginners to learn from python languages, and python has many excellent libraries to help you develop. In addition, compared to other programming languages, Python can often achieve the same function with the shortest code.

3.2.1. OpenCV Storeroom

Mention computer perspective has to mention OpenCV, it is a very widely used computer vision library, OpenCV contains hundreds of computer vision, machine learning, image processing and other related algorithms, it not only contains the classic algorithm also contains now the most advanced computer vision and machine learning algorithm, it can be used to detect and identify the face, extract 3 d model of objects. Its powerful capabilities make OpenCV widely used in many fields, such as robot navigation and searching for objects, stitching together street scenes in cities, autonomous driving by unmanned cars, and so on. In this design, the OpenCV library needs to be used for processing the images.

3.2.2. Tensor Flow

Tensor Flow is a very important software library that often appears in the field of machine learning and deep learning, and its function mainly is to perform some high-performance numerical calculation and analysis. Tensor is a tensor, which represents the transmission of data between nodes, and Flow is a data stream, which refers to the various nodes of the data operation diagram in the form of a stream.

3.2.3. Numpy Storeroom

The Numpy library is the basic library of Python in the field of scientific computing, and many machine learning and deep learning studies rely on the Numpy library. The Numpy library mainly implements the computation of matrices, which can calculate the higher order, a large number of matrices, vectors, and also have relatively rich functions. The Numpy library is an important scientific computational library.

3.3. Overall Design of Face Recognition System
3.3.1. The Design of the Convolutional Neural Network

Convolutional neural network design is the core content of this design. The main principal of convolutional neural network is the face feature extraction and the training of neural network model, so the structure of convolutional neural network will determine the effect of face recognition behind [1528]. The convolutional neural network system has designed eight layers of neural network, including three convolutional layers, three pooling layers, one fully connected layer and one output layer.The first convolution layer: The layer of depth convolution is to extract the information of various dimensions in the image, increase the number of channels of the image, and the size of the image remains unchanged. The input image size is 64 × 64 × 3, and the image size output after convolution is 64 × 64 × 32. The convolution kernel size is (3, 3), the convolution step length is 1 step, and the padding fills the SAME and the image boundary pixels during the convolution process. The number of input channels is 3, and the number of output channels is 32.The second maximum pooling layer: The main purpose is to reduce the size of the matrix, and it can reduce the amount of operation in subsequent operations. At this time, the input image size is 64 × 64 × 32, the sampling size of the pooling layer is 2 × 2, and the output image length and width are generally the input image, so the output of the pooling image is 32 × 32 × 32, which reduces the calculation amount of the image. The drop layer randomly drops some neurons with a certain probability to obtain a faster training speed.The third convolution layer: with the input image size 32 × 32 × 32, output image size 32 × 32 × 64, convolution core size (3, 3), input channel 32, output channel 64, and convolution step size 1.The fourth pooling layer: the input image size is 32 × 32 × 64, the sampling size of the pooling layer is 2 × 2, and the output of the pooling image is 16 × 16 × 64, which further reduces the information of the image, and is conducive to the calculation.The fifth convolutional layer: with input image size 16 × 16 × 64, output image size 16 × 16 × 64, convolution kernel size (3, 3), input channel 64, output channel 64, and convolution step size 1.The sixth pooling layer: The input image size is 16 × 16 × 64, the sampling size of the pooling layer is 2 × 2, and the output of the image after pooling is 8 × 8 × 64.The seventh fully connected layer: In order to enhance the nonlinearity of the neural network and limit the size of the network, a fully connected layer is fully connected. Each neuron of the fully connected layer is connected to the neurons of the previous layer, and the picture of the input of 8 × 8 × 64 is compressed into a one-dimensional vector of 1 × 512.The eighth layer output layer: The output of the system is divided into two categories, one is the face saved in the database, the other one is not saved in the database face, so as to realize the recognition function. The input picture of the output layer is 1 × 512, and the output picture is 1 × 2.

3.3.2. Monitoring Picture Acquisition Subsystem

This design uses a USB digital camera using a new data transmission interface. It can be inserted directly into the computer USB interface, which is easy to operate, and the cost is lower compared with the traditional surveillance camera. The working principle is shown in Figure 12. First, the image screen is collected by the lens, and then the light sensor components and control components inside the camera process the image into a digital signal, and finally input to the computer through the port or USB connection.

3.3.3. Face Detection

Dlib comes with a Hog-SVM-based face detector, a widely used face detection model consisting of five HOG filters forming forward, left, right, forward but left, forward but right.

This recognition pattern is the fastest method to detect on the CPU, which is suitable for both frontal and slightly negative faces, and also works properly under small occlusion, and basically this method satisfies most cases. The main drawback is the inability to detect small faces, as the minimum face size trained by the authors is 8080. Therefore, you need to ensure that the face size should be greater than the face size in the program. Not suitable for side and extreme facade, such as overlooking and overlooking, face detect face detected. It also does not work under severe shielding.

3.3.4. Image Processing

Preprocessing of an image usually includes image grayscale processing, image data normalization processing, and transforming the dimensions of an image, and the like. Face correction is because to the photos of the machine whether crooked head, bow or head, position is not right, so need to cut the face first, for each feature little positioning, then according to the point positioning a coordinate, compared with the coordinates of the real face, the Angle of the difference, is the Angle of the head crooked, the reverse rotation image, the image is positive, as shown in Figure 13 below.

The gray scale of the image is to reduce the information of the image and reduce the amount of calculation of the computer, and the size transformation of the image is to minimize the impact of the image background on the picture.

3.3.5. Eigenvector Contrast

The known face feature vectors are stored in the database, so for face recognition, the feature vectors must be compared. First, we need to send the face images in the video to the convolutional neural network, which generates the image feature vector through the convolution and pooling operations. The extracted face features will be used as the main basis for judgment, calculating the similarity of the feature vector of the picture in the video and the extracted feature vector of the face. The vectors have both size and direction, and if the angle between the two vectors is very small, then the two vectors are very close. By comparing the angles between vectors, the two targets are similar. The same face has smaller vector angles; different faces have larger vector angles.

4. Implementation of the Face Recognition System

4.1. Implementation of the Face Recognition System

Face detection is conducted through Dlib’s frontal_face_detector feature extractor, and the face interception is detected, while adjusting the contrast and brightness of the picture. The contrast and brightness values are random, which can increase the diversity of the sample. Finally, the size of the picture is re-set to 6464 and saved in the database. In this way, I collected 10,000 pictures of my own camera monitoring faces, in which the brightness of the background, human expression and posture, facial shielding, glasses removal and other aspects were changed. The process of face image collection is shown in Figure 14.

After building one’s own face library, you also need other people’s faces to train the convolutional neural network. Other people’s face photos can be extracted on the Internet through python, or some face databases in the face recognition field have been made, such as Yale Yale Faces, Cambridge ORL Faces, and FERET Faces of the US Department of Defense. In this design, choose to use the LFW face database, which was produced by the University of Massachusetts, and is an unconstrained face picture in a natural scene dataset collected from the Internet. The database stores images of more than 5,000 prominent faces and nearly 14,000 photos. It is well known in academia and often appears in a variety of deep learning papers related to face recognition. After downloading the LFW face library on the official website of the University of Massachusetts, do the same operation, intercept the face picture, change the size of the picture to 6464, and save it in the face database, as shown in Figure 15. So far, the face library needed for this design is established.

4.2. Training of Convolutional Neural Networks

The main training step of the convolutional neural network is to first read into the made face database, and transform the data and labels of the face images into arrays. These images are then divided into test set and training set. In this design, the ratio between test set to training set is 20 : 1, and it is normalized. Finally, the face pictures were introduced into the convolutional neural network for training.

The training process is shown in Figures 1618. It can be seen that as the value of the continuous training loss function becomes smaller and smaller, the accuracy of the model identification is also more and more high, which is the advantage of the convolutional neural network.

4.3. Implementation of Face Recognition

The main realization step of face recognition is to open the camera to obtain pictures and grayscale. Face detection is conducted through the frontal_face_detector feature extractor brought by Dlib, import the convolutional neural network that has been trained, and finally the face recognition function is realized through feature vector comparison.

After realizing the function of face recognition, in order to simply understand the recognition effect of the system, the system was conducted a simple test, the content of 100 face tests, the test results of 97 times successfully identified author’s face and other people’s faces, as shown in Figure 19.

Sometimes, it failed to identify someone else’s face on the phone, which could be affected by the camera resolution and the brightness of the phone’s display, as shown in Figure 20.

There were 2 more times of identifying other people’s face as author’s, as shown in Figure 21.

4.4. Advantages and Disadvantages of the System

After many experiments, the system has the following advantages:(i)The system can identify the face relatively accurately and stably, and can quickly conduct image pre-processing, and the face contour is surrounded by a frame.(ii)The system’s recognition and response speed is fast. The system uses the GPU for computational acceleration, which greatly accelerates the response speed of the system. Basically can meet the requirements of real-time identification.(iii)The system is not sensitive to the light transformation, and the light conditions are constantly changing during the experiment, but the system can still be accurately identified, which is conducive to the application in real life.(iv)The recognition rate of the system is high. In addition to a few identification errors, the system recognizes most of the time to meet the design requirements. The system has good posture, expression and without cover.The system mainly has the following shortcomings:(i)To achieve a good recognition effect requires a lot of face picture training. If fewer pictures participate in the training, the recognition effect will be relatively poor.(ii)High requirements for computer hardware. The training of convolutional network requires graphics cards with computing power above 3.0, and high CPU, memory and graphics card video memory occupancy.(iii)The functions of the system can be further improved, such as the number of faces in the database can be increased, and the logging function can be increased to save the data of people in and out to view.(iv)The functions of the system can be further improved, such as the number of faces in the database can be increased, and the logging function can be increased to save the data of people in and out to view.

5. Conclusion

Face recognition is an identification technology through human facial information. Convolutional neural network has been active in the field of AI deep learning in recent years. This design realizes a face recognition system based on convolutional neural network. And the development process of biometric recognition, face recognition technology was investigated and made a brief introduction, and the advantages of face recognition technology were summarized and summarized. The basic theoretical knowledge loss function of convolution neural network, gradient descent algorithm and error backpropagation algorithm are also summarized. A convolutional neural network model is designed, and using its excellent feature extraction ability, combined with monitoring screen collection and face detection modules to design a convolutional neural network-based face recognition system, with good recognition effect.

Although the face recognition system completed in this paper has a good recognition effect, there is still a big gap with the best recognition ability at present, and we still need to continuously improve the recognition ability of the system in the future work. In addition, the system functions are too few, but also can add many functions such as diary recording, face tracking function.

Data Availability

The dataset can be accessed upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Young and Middle-Aged Teachers Education Research Project of Fujian Province (No. JAT210737).