Abstract

Convolutional neural network (CNN) is a very important method in deep learning, which solves many complex pattern recognition problems. Fruitful results have been achieved in image recognition, speech recognition, and natural language processing. Compared with traditional neural network, convolutional weight sharing, sparse connection, and pooling operations in convolutional neural network greatly reduce the number of training parameters, reduce size of feature map, simplify network model, and improve training efficiency. Based on convolution operation, pooling operation, softmax classifier, and network optimization algorithm in improved convolutional neural network of LeNet-5, this paper conducts image recognition experiments on handwritten digits and face datasets, respectively. A method combining local binary pattern and convolutional neural network is proposed for face recognition research. Through experiments, it is found that adding LBP image information to improved convolutional neural network of LeNet-5 can improve accuracy of face recognition to 99.8%, which has important theoretical and practical significance.

1. Introduction

An image is a visual form that describes state of objective world or transforms energy into a two-dimensional plane. Image recognition is based on image recognition algorithm, which is a technology for computer to analyze specific types of objects contained in images [1]. Image recognition is used in many fields, such as biomedical pathological diagnosis, military dangerous environment reconnaissance, and intelligent portrait recognition in field of life services. Images contain a lot of useful information for human beings [2]. Effective image recognition is of great significance to progress of whole society.

Image data not only contains information about human beings but also brings huge profits to companies that master image data. The technology giant Google Inc. used data to bring about $54 billion in revenue in the United States in 2009. This makes people see importance of mining information from images. With rapid development of modern industries, image data has exploded and accumulated at an unprecedented speed. How to extract information from images has become a problem worthy of study. With deepening of information mining, more and more algorithms are used to mine information in images [3]. Machine learning is known for its efficient mining ability, adaptive ability, and learning ability. It is an algorithm with excellent performance suitable for image processing and big data. The convolutional neural network algorithm is an excellent deep learning algorithm specially designed to process image data.

The idea of neural network is to let machine learn way of human thinking to simulate way that humans obtain information from nature to deal with problems. Based on this, scientists put forward artificial neural network theory (artificial neural network for short ANN) [4, 5] proposed back propagation (BP) algorithm of artificial neural network. The BP algorithm was applied to shallow forward neural network model, and a hidden layer was added to solve XOR gate problem that perceptron could not solve [6]. The computational complexity of optimization problem is reduced, and the algorithm becomes most basic algorithm of neural network. From then on, research and application of neural network began to recover. In the 1860s, concept of convolutional neural networks was proposed. Reference [7] found that visual image of a small area in cat’s visual cortex is only processed by one neuron, which is receptive field. Reference [8] proposed that this is first prototype of a convolutional neural network, in which visual system is modeled and can be recognized when objects are displaced or even slightly deformed. The neurocognitive machine proposed by scientists contains two types of neurons, one neuron S-cells extract features to filter convolution kernel of convolutional neural network, and other neuron C-cell is antideformation for convolutional neural network. Activation function and pooling: convolutional neural networks are developed on basis of multilayer perceptrons. Reference [9] proposed to be derived based on an efficient BP training method and first achieved success in field of English handwriting recognition. Convolutional neural network is the first artificial neural network that has been successfully trained in deep learning [10]. Subsequently, various aspects of convolutional neural network have been continuously improved, making it a most potential and successful model. Convolutional neural network is widely used. There are applications in face recognition, pedestrian detection, natural language processing for robot navigation, medicine discovery, disaster climate discovery, and even Go AI programs [11]. At present, there are many research studies on convolutional neural network in world, mainly aiming at improving the structure of convolutional neural network. Domestic research is still in its infancy.

Image recognition is the basis of image classification and detection, and correct image classification results are of great significance to development of computer vision. Image recognition is a research that analyzes image categories by extracting feature information of images. Because image information is very complex, it is particularly important to select effective features [12]. Although algorithms for feature extraction are continuously proposed, traditional algorithms are difficult to meet performance and efficiency requirements of massive images. After more than 50 years of development in image recognition, a large number of excellent machine learning algorithms have emerged [13]. Shallow neural network algorithms such as artificial neural network (ANN), support vector machine model (SVM), Gaussian mixture model (GMM), and conditional random field (CRF) are correct in classifying a series of objects such as fruits, rock formations, and peanuts rate of more than 90%. The shallow neural network also has problems in obtaining good classification [14]. For example, if you need to manually select features, training effect after deepening number of layers is not ideal [15]. Deep learning is a deep network algorithm that automatically learns image features [16]. Deep learning is divided into supervised learning (Supervised Learning) and unsupervised learning (Unsupervised Learning) according to whether there are labels [17].

2.1. Convolutional Neural Network Model

More and more scholars in various countries have joined in the research of machine learning and various high-tech companies. Machine learning is becoming more and more intelligent and has made outstanding achievements in various fields, and its results are continuously applied in practice. Figure 1 shows the application of deep learning in various fields.

Neuron is a basic processing unit of neural network, generally has multiple inputs and one output. The basic structure model is shown in Figure 2.

The convolution operation has two excellent properties: sparse connection, that is, in convolution operation, one input neuron is only connected to corresponding part of output neurons, and one output neuron is only connected to corresponding part of input neurons. Figure 3(a) shows the sparse connection method of convolution, and Figure 3(b) shows the full-connection method. It can be clearly seen that each output neuron is connected to all input neurons in the full-connection method, while in convolution sparse connection method, an output neuron is only connected to corresponding part of input neurons, for example, output neuron s3 is only connected to input neurons x2, t3, and x4.

Parameter sharing, that is, parameters of each part of convolution operation are shared. Figure 4(a) shows the convolution operation, and Figure 4(b) shows full-connection operation. It can be seen that weight parameters of each connection in full-connection operation are different. Therefore, when the number of input and output neurons is 5, the required number of parameters is 25. In convolution operation, because of the nature of parameter sharing, such as weight parameters of x and s, connection is same, so when the number of input and output neurons is 5 and size of convolution kernel is 3, the number of parameters required is 3.

The convolutional neural network inputs an image, and when image is only black and white, the image can be abstracted into a two-dimensional matrix in space. When image is color, there are three color channels, and nodes of each layer of neural network are composed of three RGB vectors, which can abstract image into a three-dimensional matrix in space. Figure 5 is a schematic diagram of connection of convolutional neural network nodes. It can be seen from figure that multiple nodes in the upper layer of convolutional neural network are connected to a single node in the lower layer of neural network.

3. Softmax Algorithm

In softmax, what needs to be solved is a multiclass problem (assuming k classification problem), so value range of class label y is k different integer values, and different values represent different classes. For training set with m samples, there are , of which . Given a sample , we can use hypothesis function to calculate probability value that sample belongs to each category j, that is, estimate sample for each category probability of outcome. So, suppose output of function is a k-dimensional numeric vector (the vector element values sum to 1), where component of vector represents estimated probability value of sample belonging to class. Suppose function is of the following form:where is a model parameter. If a matrix is used to represent parameter , there are following: .

The cost function of softmax algorithm is as follows:where is an indicative function. For minimization problem of , there is no closed-form solution, and generally an iterative optimization algorithm (such as gradient descent method) is used. After derivation, gradient formula is as follows:

In the gradient descent method, each iteration needs to be updated as follows: .

Usually, softmax cost function will add weight attenuation term [18], which can solve numerical problem caused by parameter redundancy of softmax. The cost function with weighted attenuation term is as follows:where is the weight attenuation coefficient. Its gradient is calculated as follows:

Suppose a labeled sample set contains M samples. The following equation describes the use of batch gradient descent method to solve neural network model. For a single sample , cost function is given as follows:

Then, cost function of whole m samples is defined as follows:

The gradient descent method updates parameters and B according to the following formula:where is the learning rate, and the most critical step is to calculate partial derivative. Let’s introduce effective calculation method of partial derivative: back-propagation algorithm.

4. Results

We trained five different convolutional neural networks, and relationship between loss value and the number of iterations during training process is shown in Figure 6. In order to clearly see difference of each curve, we discard selection process of first 1000 times because their loss value is relatively large. The figure depicts training process from 1000 to 10000 epochs. It can be seen from the figure that when the number of generations reaches 3000, the loss value of network training has achieved a better effect. Among all networks, LeNet-E network can achieve lowest loss value, indicating that in case of fitting, parameters learned by its network structure can better express image features, that is, deeper network structure can better fit data features. It can be found from the figure that network loss value of LeNet-D and LeNet-E networks is very unstable during training. It shows that although dropout strategy helps to improve recognition rate of network, it affects stability of network training.

We also plot the relationship between test accuracy and number of iterations, as shown in Figure 7. It can be found that LeNet-D and LeNet-E networks have higher recognition rates than other three networks, indicating that the depth of network and dropout strategy have a greater impact on recognition rate. From the curve of LeNet-A, it can be found that with the increase of number of generations, recognition accuracy has a downward trend, which may be overfitting. When the width of network is not enough to represent characteristics of input data, the recognition rate of test set will decrease with the increase of number of network selections.

5. Conclusion

Convolutional neural network is a very important feature learning model in machine learning. It has strong representation ability in image recognition and has been widely used in the field of pattern recognition. This paper conducts in-depth research on convolutional neural networks, introduces in detail network structure of convolutional neural networks and specific operations of convolution and pooling, and introduces back-propagation algorithm used in network optimization. Based on LetNet-5 network model, five different network structures are constructed for handwritten digit recognition. The effects of network depth, number of hidden layer convolution kernels, and feature dimension on recognition accuracy are tested. The experimental results show that deeper and wider network structure is more conducive to improvement of recognition rate, larger feature dimension, and better recognition rate. Finally, a face recognition method based on local binary pattern and convolutional neural network is proposed, which uses stitching of original input image and local binary pattern image as the input of convolutional neural network, so that convolutional network not only learns the global features of face but also local structural features of face. Through experiments, it is found that added LBP information helps to further improve face recognition rate.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.