Abstract

Digital human resource management can improve the organizational and operational efficiency of enterprises.In order to improve the efficiency of enterprise digital management and solve the problems of low security level and insufficient stability of 2D face recognition, we introduce 3D face recognition into the digital human resource management system. We propose a face recognition method based on a multistream convolutional neural network and local binary pattern and build a digital face recognition management system. We first build the system computer vision scene. Then a local binary mode facial expression feature extraction scheme is designed according to the depth camera image extraction method. Considering that face 3D features are easy to be missed, we build a multistream convolutional neural network to learn facial 3D features. Finally, we validate the effectiveness of the method in selecting a public face dataset. Experiments prove that our method can reach 98% face recognition accuracy, which is significantly better than other methods.

1. Introduction

An efficient human resource management (HRM) approach is necessary to maintain organizational efficiency and helps to enhance efficient iterations of organizational change at the company level. Most HRM teams are currently introducing intelligent management systems into their company’s personnel management work, reducing workload and lowering HRM costs. Many researchers have proposed many research methods in intelligent HRM, and a large number of research results have been achieved. The intelligent HR management system is a system with a huge system [1, 2]. Considering the multifunctionality and complexity of the HR system, combined with our recent research, we will start the research around the HR punch card assist system. The one that has been applied more in the clock-in aid system is the face recognition clock-in system [35].

The human face is a unique biological feature, and each person’s face contains features such as expressions, facial features, and contours. The rational use of facial biology in an ethical scope can create great economic benefits, but facial expression feature extraction and analysis is a difficult task. How to correctly extract facial expression features is a major research hotspot, and after extracting facial features, how to efficiently utilize them and incorporate them into automatic feature learning networks is also a major challenge. Initially, in facial recognition research, researchers favored machine learning methods. Initially, facial recognition was only at the 2-dimensional level [69]. Commonly used machine learning methods include principal component analysis and support vector machines. However, traditional machine learning methods require a high level of manual feature design and cannot automate the feature labeling process. In addition, it is difficult for traditional machine learning methods to maintain the stability of the model in the irregular changes of an unstructured environment, as a result, machine learning methods are gradually replaced by deep learning methods in the subsequent research. Deep learning methods require a large database as the training set, and the number of training sets determines the accuracy of deep learning models for face recognition. The initial deep learning-like face recognition methods also started from the 2-dimensional level and were mainly based on pixel features. The 2-D facial features always have the problems of information occlusion and missing information, which cannot complete the high precision facial biometric feature learning. Three-dimensional facial feature analysis based on deep learning is more accurate, and intelligent systems of three-dimensional facial recognition are chosen in some places or companies with high-security level requirements. For a three-dimensional facial recognition algorithm, the requirements for computer vision scene construction are relatively high. Ordinary RGB camera intelligently captures two-dimensional pixel features. If you want to capture three-dimensional facial biometric features, you need to use depth cameras to obtain depth information of the visual scene, and then through depth information data reduction, three-dimensional scene reconstruction, and other operations. The same depth camera is configured in the equivalent 3D facial recognition system for scanning face information [1014].

To enhance the security of digital management of enterprises. We introduce 3D face recognition into the digital human resource management system. We propose a face recognition method based on a multistream convolutional neural network and local binary pattern and build a digital face recognition management system. We first build the system computer vision scene. Then we designed a local binary mode facial expression feature extraction scheme based on the depth camera image extraction method. We also constructed a multistream convolutional neural network to learn facial 3D features.

The rest of this paper is arranged as follows. Section 2 describes the research related to face recognition. Section 3 details the principles and implementation process associated with improved local binary patterns and multistream neural networks. Section 4 presents the relevant experimental datasets and an analysis of the results. Finally, Section 5 reviews our findings and reveals some additional research.

The literature [15] was one of the first studies to propose a face recognition system, where the authors built a face pixel autocorrelation matrix to fit the pixel computation model around a feature neural network. Face images require manual production of datasets and each face label needs to be preprocessed and labeled with its data source. The face database requires pixel-level normalization and coordinated positioning for training set and test set classification. However, the method does not perform well enough in the experiments. Researchers in the literature [16] upgraded on previous face recognition studies and the authors proposed the face algebraic database replacement method. The face pixel data is directly converted into a computable feature matrix during the face data collection preprocessing process, which optimizes the substitutability of face data. Researchers in the literature [17] proposed a face feature residual coding structure to determine the three-point pixel localization of the eyes and nose, which solves the undesirable problem of the least constrained environment such as insufficient light.

Considering that face recognition is strongly influenced by unstructured environmental factors, researchers in the literature [18] experimentally analyzed the efficiency of principal component analysis methods and also compared the range of action of fisher faces and local binary patterns. It was found that for the pixel computation of grayscale maps, the monotonic transformation matrix and the face pixel features cannot be converted with a uniform rotation matrix. However, the pixel features obtained from the face pixel texture are more accurate in distinguishing the features of the five senses. The researchers of the literature [19] explained the shortcomings of the local binary pattern approach through experimental data and presented the superiority of the viola-jones approach. Considering the effects of nonstructural environmental factors, the authors also analyzed various optimization methods proposed in the literature [20, 21], detailing the best solutions in various situations such as occlusion, background differences, andnoise effects, providing stable technical support for subsequent studies.

Researchers in the literature [22] developed experiments around the effect of light changes on face recognition. In the paper, the authors present common solutions to face recognition lighting variations and point out the drawbacks of each method. In a later experiment, the authors proposed a face facial light transformation balancing method based on distance transformation and kernel feature extraction. The method is based on facial texture features, and for the problem of robust illumination normalization, the authors explain how the method maintains stable extraction of facial features in facial illumination transformations, using facial pixel noise as an example. Researchers in the literature [23] also took a noise treatment approach between face recognition and balancing of illumination. To address the problem that high-level noise has a high impact on facial Lupin features and facial edge detection is weakened, the authors take inspiration from the backpropagation shear wavelet operation. The authors first divide the face into different regions uniformly at the pixel level, each region corresponds to an independent classifier, and the high-level noise can be eliminated by fitting facial features with the same characteristics by similarity at the terminals of the combined face regions, and finally by reorganization. The method achieves a noise elimination rate of 86% in experimental validation, which improves the robustness of the face recognition model.

To improve the accuracy of face recognition, the local binary pattern method is the first approach of most researchers. Optimization and fusion based on local binary patterns can improve the overall model efficiency. With the emergence of deep learning methods, researchers have tried to incorporate convolutional neural networks into face recognition models, while the literature [24] presents a real-time multiplayer face recognition method, which is an embedded system developed in a GPU system and also configured with computer vision scenes that can achieve functions such as face tracking and fast recognition. The method is an example of the successful application of a neural network in a face recognition model, and this face recognition system provides great reference value to the subsequent face system construction. The researchers in the literature [25] introduced and affirmed the former research results in their article, and the authors elaborated their research ideas and methodological sources from a new perspective. Starting from a binary histogram algorithm, the article decomposes facial expressions of human faces between low-level and high-level pixels and extracts high-level and low-level features of expressions by pixel enhancement. The experimental results demonstrate that the method can enable face recognition models to obtain generalizability. The literature [26] analyzes in detail the differences and efficiency of local binary models and support vector machines and incorporates a particle swarm algorithm based on support vector machines. The recognition accuracy of this model was improved by 10 percentage points in the test of the public database of faces. Researchers in the literature [27] focused more on the random forest law, and the authors chose facial texture features as a measure of the fusion of symbolic and magnitude features. In the face public dataset supplement experiments, the authors compared the method with the local binary pattern method. The experimental results show that the method is more efficient in the classification of facial expressions.

3. Method

3.1. Face Recognition Pipeline Overview

We have reconstructed the face recognition method and designed a three-dimensional space-based face recognition network. First, we build a digital computer vision scene with three depth cameras REALSENSE, each depth camera’s RGB image sensor and depth infrared sensor are in linkage, sharing frame rate to avoid frame loss. The facial RGB image information and depth information are simultaneously input to the face recognition model, and firstly, the physical feature calculation unit is used to fuse the depth information and RGB information to generate 3D facial information. The 3D facial information contains two sets of 3D facial contour features and data stream face classification theme scores. The weight in the estimated score of RGB image information is relatively large, and the features are extracted by a specific face estimation algorithm. The different levels of features in the database are generated by iterative iterations and are used to correct the weight parameters for the score estimation. For uncertain estimates, they will be directly input to the decision layer within the range of small probability events. Finally, all the features will be input to the decision layer and the decision algorithm will make a feature response to the face based on the subject category, and confidence level. The feature response will be directly connected to the human biological database and can be associated with the real company employee information through a mapping matching relationship, thus achieving the function of face recognition. The composition of the face recognition system is shown in Figure 1.

3.2. Local Binary Pattern Network Reconstruction

Most researchers have demonstrated that local binary pattern neural networks have good results in the field of face recognition. We have also experimentally verified the efficiency of the local binary pattern framework, considering the shortcomings of the method in terms of accuracy and network structure. In our work, we will use the local binary pattern as the basis of the framework and focus on improving the recognition accuracy and speed of face recognition systems in unstructured environments. We also optimize feature details at the pixel level and generate facial attribute feature points adaptively.

In the first layer of the face recognition network, we first set up the image information preprocessing layer, assuming that α represents the face image contrast adjustment parameter and β represents the face contour size scale value. During the experimental test, we measure the α and β values in infinite approximation, and we find that the image preprocessing effect is best when α = 1.5 and β = 0.3. The preprocessing layer function relationship is shown below.

In our previous experiments, we found that when the convolutional neural network filter processed the face RGB image, each convolutional operation was performed, then the specific curve was biased with higher pixel values in the region around the specified pixel. As the filter passes through each region of the image, it changes depending on the type of filter. Considering this problem, we have borrowed filter design rules from other studies, and most of them collectively show that the more similar the filter features are to the convolutional computed image features, the higher the probability of filter activation and more accurate filtering results. To this end, we set up independent experiments to verify the feature similarity of the Gaussian fuzzy filter [28], median filter [29], and bilateral filter [30] in the face recognition framework, respectively. The experimental results show that the bilateral filter can obtain filtering results that are more similar to the facial image features. Therefore, we place the bilateral filter in the second layer of the face recognition network, and the bilateral filter mathematical equation is shown in the following formula.where represents the weighting function of the facial RGB input in the convolution calculation, represents the pixels in the neighborhood of the filtered image area, respectively, and represents the linear summation of the filtering weighting function, and it also represents the result of the bilateral filter applied onn a 2N + 1 neighborhood. Assuming that represents the fusion result of the input facial image with the noise reduction algorithm, the mathematical equation is shown in the following formula.

We regard it as the image noise control contrast weighting. Where represents the facial image contrast and represents the type of filter used in the convolution calculation. In the experiments of pixel equalization operation for facial images, we found that facial images are prone to the problem of global low noise. To avoid this problem, we adopt the image histogram method and perform parameter weighting operation based on this method, and the mathematical equation expression is shown in the following equation. Experiments prove that the method can effectively solve the problem of global low noise.where H′ represents the linear normalization operation, and the optimization of the researchers’ line linear normalization in the literature [31] suggested that the local binary pattern behaves more stably in a fixed neighborhood window, and its principle of action is shown in Figure 2.

We also used the local duality mode operator as a basis, and the expression of its underlying mathematical equation is as follows.where represents the threshold of the center of the face pixel, and represent the intensity of the neighboring pixels with the center pixel of the face as the dispersion point, and p represents the computed range of the center dispersion pixel with radius r.

3.3. Multistream Convolutional Neural Networks

Convolutional neural networks have high generalizability and robustness in image processing. In a face recognition system, the neural network model can give full play to its advantages of weight sharing and local connectivity. In the process of facial contour feature extraction and reorganization learning, the reconstruction of facial appearance pixel features can be avoided and the computational cost is reduced. As mentioned in the computer vision reconstruction, we used three depth cameras to reconstruct the computer vision scene and visually capture face features from different angles of the same horizontal plane. To echo the computer vision scene, the multistream convolutional neural network we adopted processes the facial features from different angles separately. All integrated features are separated according to biological categories, and the local detail features of the face are learned by convolution, then the number of feature map network parameters is reduced by connecting pooling layers, and the global average pooling is shown in Figure 3. Finally, the global information is generated by connecting fully connected layers. We chose a nonlinear activation function to classify the output biological category feature mapping values.

The disadvantage of multistream convolutional neural networks is that the networks are too cumbersome when optimizing the spatial layout of a multistream convolutional neural network. Considering the fusion of local face features and channel features, referring to the SE architecture methods in the literature [32, 33], we compressed the multistream neural network by setting the excitation function in the mid network and resetting the parameters of all neural network layers at the end. Such a network architecture approach helps to establish inter-channel dependencies, improve the perceptual field, and suppress the facial features with low weight coefficients. Our convolutional neural network architecture is shown in Figure 4.

For multistream convolutional neural network compression, the face space depth features are the first input, and the face features are reset and encoded using global average pooling, which is represented in the mathematical equation as follows.where C represents its output feature map size, and H and W denote the height and width of the feature map at the pixel level. After the convolutional layer compression, the excitation function is added in the middle of the network, and we choose the sigmoid function as the excitation function. A gating mechanism is also added to the network to balance the mapping rate between channels.where ,. In order to improve the robustness and generalization of the model, the following output can be obtained by matching the feature activation values with the original biometric feature library.

We also added two layers of bottleneck structure at the tail of the network, the first layer is composed of a fully connected layer, and we set the dimensionality reduction parameter as r. The second layer is composed of a ReLU activation layer, and each face RGB channel has the corresponding position and original input activation value. The multistream convolutional neural network face recognition process is shown in Figure 5.

4. Experiment

4.1. Setting

According to the project requirements, the digital face recognition system contains a computer vision unit, a deep convolutional neural network algorithm unit, a GPU accelerated computing unit, a data storage unit, and a PC control unit. In the computer vision unit construction, we use the stereo system architecture platform AVT Pike to assist face image and depth information acquisition, with cameras placed at 45 degrees and kept at the same level. Camera No. 1 is facing the face, and cameras No. 2 and No. 3 are spaced 45 degrees apart from camera No. 1. The interval between the cameras and the face is 1000 mm. The effective range of the three cameras acting together is 500 mm. We use IEEE_1394 firewire to connect the lenses of the three cameras to the central console. The details of the computer vision system build are shown in Figure 6. One of the deep convolutional neural network units uses TensorFlow as the base network framework, and the algorithm editing language is python. The face recognition model training platform is a NVIDIA Quadro P6000 graphics card, the computer memory is 64 GB, and the processor is Intel(R) CoreTM i9-9900 K [email protected] GHz x8. The network training process uses a hierarchical training model with layer-by-layer update iterations to optimize the training parameters.

4.2. Analysis of Face Area Weighting

To divide the face from a 3D level, we split the five senses in terms of facial region division, and we analyze the degree of biometric matching of the five senses of the face. According to the facial region association, in the experiment, we strictly control the light angle to obtain the facial five senses shading. Since multistream convolutional neural network feature learning is more biased toward light reflection and motion blur, the facial images are divided into three angles for shooting. To ensure the uncertainty caused by unstructured environmental factors in the image acquisition, we adopt iterative iterations to update the facial five senses feedback features during the training process.

From the experimental results in the above Table 1, it can be seen that in the facial recognition facial five senses partitioning experiments, the eyes occupy a larger proportion of the weight, and the recognition accuracy of the eyes is higher. The weight proportion of the mouth is second only to the eyes because the change of mouth shape will affect the overall configuration of the face contour. The other five senses occupy a smaller proportion and have little influence on facial recognition. In summary, it can be seen that face recognition accuracy is more closely related to the biological features of the eyes and mouth, and the post-training parameter adjustment of the neural network should take the five features with higher weights as the main reference factors.

4.3. Face Recognition Experiment Results

To verify the validity of the method, we first selected public datasets as the experimental validation set, and we chose PubFig: Public Figures Face Database, Large-scale CelebFaces Attributes (CelebA) Dataset, and Colorferet as the base datasets. To maintain the suitability of the datasets to the face recognition model, we adopted the same preprocessing means for all the datasets. The preprocessing process is shown in Figure 7.

All datasets are grayscale images of 384 × 286 pixels at the input side for testing positive face angle correction. For RGB correction, all datasets were set to 384 × 286 pixel channel controllable images. The preprocessed output is stored in the format “FACE_XXXX.pgm.” Details of the datasets are shown in Table 2.

In concurrent experiments, we conducted independent experimental tests on support vector machines (SVM) [34], local binary patterns (LBP) [35], and convolutional neural networks (CNN) [36]. To balance the variability between each method, we use a closed-form parameter adaptive correction method for each method and migrate to face recognition model training when the parameters reach the optimal state. The comparative experimental results of the various methods are shown in Table 3.

where ATP denotes the recognition efficiency per 100 face samples. p denotes precision and R denotes recall. From the experimental results in Table 3 above, it can be seen that the SVM method produces a large number of misidentified samples in the face recognition dataset test, and the precision is only 77%. Compared with machine learning methods, deep learning methods perform better in face recognition, and convolutional neural network methods are more advantageous in the field of image recognition, with 92% accuracy. Our method incorporates a deep learning network and LBP algorithm framework and achieves 98% accuracy, which is better than other algorithms, thus showing that our method performs best in face recognition. To restore the influence of interaction between network layers, we also added independent network layer performance test experiments, and the experimental results are shown in Table 4.

From the experimental results in the above table, we can see that the multistream convolutional neural network we use has outstanding advantages in the layer-by-layer stacking experiments, and the best face recognition results are obtained when the multistream branch value is 4. After the multistream branch exceeds 4, the accuracy rate starts to decrease. In the final experiments, we also adopt the training scheme with a multistream branch of 4.

5. Conclusion

In this paper, to reduce the cost of human resource management and improve the efficiency of enterprise digital management, we propose a face recognition method based on multistream convolutional neural networks and local binary patterns and build a digital face recognition management system. We first build the system computer vision scene. Then a local binary mode facial expression feature extraction scheme is designed according to the depth camera image extraction method. Considering that face 3D features are easy to be missed, we build a multistream convolutional neural network to learn facial 3D features. Finally, we validate the effectiveness of the method in selecting a public face dataset. Experiments prove that our method can reach 98% face recognition accuracy, which is significantly better than other methods. Also, we proved the scientific efficient of multistream convolutional neural networks by ablation experiments.

Although our method performs well in terms of accuracy, the face recognition system is input to the embedded standalone system. The model implemented by our method requires high hardware facilities for the standalone system, which leads to the low stability of model fusion with the standalone system. In our later research, we will work on reducing the algorithm parameters, reducing the computational cost, and improving the compatibility of the algorithm with the standalone system.

Data Availability

∗The dataset can be accessed upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.