Abstract

Unlike traditional image recognition technology, DL can automatically extract features and improve recognition accuracy by combining feature extraction and classification. The challenges and shortcomings of traditional image recognition methods are discussed in this article, as well as the development process and research status of DL. Related theories in image recognition based on deep learning (DL) are proposed, DL’s basic models and methods are analyzed, and related image data sets are demonstrated experimentally. Furthermore, because DL is typically used for large sample sets, this paper proposes an improved algorithm based on small samples, as well as a DNN-based analysis model for the evolution of ancient large figurine images. This model, when compared to the traditional neural network model, can speed up the network’s convergence speed and reduce training time to a certain extent. This model improves the rate of image recognition while lowering the error rate.

1. Introduction

The artistic genius was exerted on ritual and sacrificial vessels, and it also formed a tradition with far-reaching influence [1]. Large-scale figurines in ancient China are very decorative. Whether they are figures or animals, whether they are decorative figurines or Buddhist figurines, they all have a strong decorative meaning [2]. Decorative art is different from real life, but it is the ubiquitous artistic reality in human life. Since the Southern and Northern Dynasties, due to the development of Buddhist art, many large Buddhist figurines with decorative elements can be seen. Although they belong to religious themes, they are still created based on real life [3]. Since the Sui and Tang Dynasties, the decorative figurines have reached an unprecedented peak. The decorative patterns and patterns of the figurines show the strong sense of decoration of the figurines in the best way, full of Chinese style. China has its own historical and cultural characteristics, as well as its artistic characteristics. Therefore, the precise identification and analysis of the image evolution of ancient large figurines have very important research significance [4, 5]. Although the deep neural network (DNN) [6, 7] has achieved great success in many pattern recognition tasks, there are still many problems to be solved urgently. On the one hand, although DNN has remarkable performance in image classification and other tasks, it still makes some mistakes that human beings cannot make. On the other hand, researchers still do not know how DNN learns effective feature representation from large-scale data. For this reason, it is necessary to strengthen research on related content and further promote the development of related fields.

DL is now widely used in the field of image recognition and has produced impressive results. The imaging speed and resolution of image equipment have greatly improved with the rapid development of computer technology, and the accompanying image data of ancient large-scale figurines are playing an increasingly important role in figurine evolution analysis. Deep visualization technology is aimed at visualizing the feature representation learned by a single computer in a DNN by generating images, feature mapping, or convolution layer, aiding in understanding how neural networks learn feature extraction from data, and aiding in analyzing the advantages and disadvantages of the extracted features, in order to aid in the design of a better network structure to improve the network’s advantages and disadvantages. The drawbacks are avoided [8]. In the field of computer vision, image recognition of ancient large-scale figurines has become a research hotspot. The image evolution analysis model of ancient large-scale terracotta figures based on DNN is constructed in this paper, which discusses the research and application of DL in image recognition.

With the progress of society and the development of science and technology, DL has become one of the latest trends in machine learning and artificial intelligence and has become one of the hottest research directions in today’s society [9]. The development of DL has had a great impact on the research in the field of computer vision and machine learning [10]. DL is a new field in machine learning research. It is a kind of unsupervised learning. It simulates the cognitive process of the human brain and constructs a structured model to extract features [11]. The whole training process only needs the cooperation of a computer, without manual participation, and we can extract good features and get the image recognition effect we want [12]. At present, most of the existing image recognition methods of ancient large-scale figurines are machine learning methods. The process not only is cumbersome but also requires manual design of features, resulting in poor recognition effect and time-consuming training [13]. DL can simulate the hierarchical structure of the human brain nervous system, can integrate feature extraction and classification, can realize automatic extraction of complex features, and has strong data representation ability. Nowadays, new DL technologies are emerging. It has had a far-reaching impact on people’s lives. By studying the application of DL in image recognition, this paper explores the connotation of its development. Combining DL with image recognition of ancient large-scale figurines, the recognition model is established. It focuses on how to improve the accuracy and speed of figurine image recognition. It is hoped that the application effect of DL can be further improved, so that it can play a greater role in the field of image recognition of ancient large figurines.

When studying the problem of image recognition, literature [14] proposed a CNN model trained using the BP algorithm. The “black box” characteristics of DNN, which are temporarily incomprehensible, make it difficult for researchers to use the neural network (NN) to solve problems, according to literature [15], and it also reflects the presence of neural networks in the actual application process risk. The fine-tuned network’s convolution kernel has been retrained to have richer phase characteristics, resulting in improved classification performance. Literature [16] discovered that the convolution kernel in CNN has a large number of dual characteristics, and based on this, a new type of activation function is designed—serial nonlinear activation function to greatly reduce network performance while maintaining network performance. In the network’s parameters, some neurons have strong response types to specific types of images, such as human faces, according to the literature [17, 18]. The development of machine learning is divided into two stages, according to literature [19], the shallow learning stage and the deep learning stage. Literature [20] proves that CNN can be extended to two-dimensional and three-dimensional image analysis tasks, and random rotation sampling of the data set can improve the classification performance of CNN. Literature [21] believes that the biggest advantage of CNN is to reduce the parameters of the network through receptive fields and sharing weights, that is, parameter reduction and weight sharing. Literature [22] found that a picture that can be correctly identified and classified by NN; if some noise that is completely unrecognizable by the human eye is added to it, NN may mistakenly recognize the image as another category. Literature [23] found that even the current best-performing NN will generate some special images. The NN will believe that it belongs to a certain category with a confidence of more than 99.99%, and these images are completely noise in the eyes of humans. Literature [24] found that the phase information of the convolution kernel has a lot of redundancy by directly displaying the convolution kernel in CNN. Literature [25] optimized a noisy image to maximize the activation depth of a certain type of neuron in the probability layer of the network, showing the concept of the NN learned from the training data. Literature [26] also uses this method to show the image features represented by each filter response in the network. Literature [27] extended the maximum activation visualization method to the middle layer neurons in the NN, which can not only reveal the image features learned by the neurons in different layers of the network but also generate very magical, dream-like images. This paper studies the application of deep learning in the image recognition of ancient large figurines. Based on CNN, an adaptive CNN model and a multiscale adaptive CNN model are proposed, which significantly improves the recognition accuracy of ancient large figurines images and the generalization of the network performance.

3. Methodology

3.1. Large Ancient Figurines

The softness and massiness of ancient Chinese large figurines are closely related to the temperament, living conditions, geographical environment, philosophy, ethics, and other cultural factors of the Chinese nation. Art is also pursued in this way, which is manifested in the implicit beauty and inner beauty in plastic arts. The same is true of large-scale figurines. The ancient Chinese large-scale figurines give people the impression that they are not as visible and vivid as the western classical large-scale figurines. There is no arrogance, but like Chinese calligraphy and painting, the power is wrapped inside, giving people more aftertaste, for example, the Terracotta Warriors and Horses in the Mausoleum of the First Qin Emperor and the female figurines in the Han and Tang Dynasties singing and dancing.

The decoration of large figurines has a strong totem meaning and religious belief, and it contains people’s wishes and imaginations while also supporting the creators’ spiritual pursuit. It has a profound mystery and is pregnant with the birthmark brought by arts and crafts. The solemnity of Buddhist figurines can also be enhanced through decoration. Chinese Buddha figurines have a kind of nonhuman mystery to them due to the decorative virtual components, but they also contain a kind of kindness. Decoration has this effect because it is not only different from life’s reality, but also from the artistic reality that Chinese people see everywhere in their lives [28]. Furthermore, exaggerated artistic techniques of deformation were frequently used in ancient China’s decorative large-scale figurines. The stone lions, for example, used the art of deformation decoration to combine the carvings and patterns of bronze and jade articles, giving the stone animals a more powerful and invincible appearance and better serving the decorative function of figurines.

The temperament, living conditions, geographical environment, philosophy, ethics, and other cultural factors of the Chinese nation are all closely related to the softness and massiness of ancient Chinese large figurines. In ancient China, the majority of large-scale figurines represent people’s spiritual pursuits and ideal desires, and they are idealistic figurines. Like western large-scale figurines, ancient Chinese large-scale figurines pay little attention to realism, imitation, and reproduction of nature. But it is profound, and it leaves people with a greater sense of gratitude. Lux, the mighty and brave king of the Tang Dynasty, for example, has this effect, displaying the characteristics of hairless storage. Inner beauty, primitive beauty, and a philosophical spiritual realm were all pursued by ancient Chinese large-scale figurines. From the standpoint of western art, these pursuits are difficult to comprehend. As a result, it is critical to improve Chinese traditional culture cultivation, fundamentally comprehend Chinese art, and comprehend the connotation of Chinese traditional culture.

Ancient Chinese large figurines are the product of national culture, with profound cultural connotation and strong nationality. The characteristics of large-scale ancient figurines do not exist independently but are integrated and interrelated with each other. When appreciating the ancient large-scale figurines, we should integrate all the features before we can discover the existence value of figurine art.

3.2. Analysis of DL in Image Recognition

With the advancement of science and technology, as well as the diversification of human social activities, image recognition technology is becoming more widely used in our daily lives. From the standpoint of modern science and technology development, rapid scientific and technological innovation will inevitably provide a more comprehensive application platform for image recognition technology, as well as higher and stricter requirements for image recognition technology [7]. In the field of image recognition, deep learning is a critical technical tool. In the field of image recognition, DL has a lot of potential. At the same time, accurately identifying and analyzing the evolution of ancient large-scale figurine images based on DNN are critical for research. It can aid in the cultivation of Chinese traditional culture, the fundamental understanding of Chinese art, the comprehension of Chinese traditional culture’s connotation, and the promotion of Chinese traditional culture’s charm. Preprocessing, region of interest segmentation, feature extraction, and classifier design are the four main processes in the traditional image recognition method for large figurines, as shown in Figure 1.

With the development of information technology, all aspects of society will become more intelligent and personalized. Machine learning is an important discipline in artificial intelligence technology, and deep learning (DL), as the hottest branch of machine learning, has been continuously warming up in academic and industrial circles in the past few years and has made significant breakthroughs in complex tasks such as computer vision, speech processing, and text processing. DL is one of the methods of machine learning. It originated from human research on the artificial neural network. The basic concept behind DL is to combine simple features to create more complex, abstract, and undefined features. It is a data representation-based learning method. The computer iteratively updates the parameters between DL network levels, resulting in a training result that approaches the true value infinitely, achieving the training goal. It belongs to the multilevel network structure, and each layer contains a large number of neurons, which can be regarded as a simulation of human brain nerve structure, similar to the traditional neural network structure. Because it can extract features from low to high level from a large amount of data, DL has such a powerful representation ability. If the amount of data is too small, the model will struggle to learn enough features, and the recognition effect will be inferior to that of traditional machine learning. The neural network structure is shown in Figure 2.

The vigorous development of DL has brought positive effects to many fields. For example, the application of the convolutional neural network (CNN) has promoted the development of image recognition technology. In addition, the introduction of DL has also made a huge contribution to improving the accuracy of speech recognition. At the same time, it has made significant achievements in vehicle detection and traffic sign recognition and even surpassed humans in traffic sign classification. At present, DL has made significant progress in figurine image classification learning and representation learning, and many researchers have achieved good results. However, there are still many questions to be resolved about DNN itself. The current in-depth network research is more like a trial-and-error process, through adjusting the number of layers of the network structure, the number of neurons in each layer, and the relevant parameters of network optimization to get a better and better result. This method requires researchers to have rich experience in tuning parameters, consumes a lot of time for trial and error, and is not suitable for most researchers. Nowadays, DL is developing rapidly and is widely used in various fields, but our research on DL is still in the development stage, and there are many problems that need to be solved further.

3.3. Research on Image Recognition Technology of Large Figurines Based on DL

To develop an intelligent society, computers must be able to observe, discover, and think like people in order to analyze and solve problems automatically. In the past, humans first analyzed problems and provided solutions in the form of programs, which were then realized with the help of computers’ efficient and accurate calculations. However, in real life, there are still some issues that humans find difficult to analyze or for which the analysis effect is insufficient. Earlier machine learning algorithms used supervised learning, whereas deep learning (DL) mostly used unsupervised or semisupervised learning. As a result, it only requires a small amount of manual intervention and can use a large amount of unlabeled data effectively. It is better suited to today’s large-scale computing systems and the field of figurine image recognition with large amounts of data.

After establishing the complex model and objective function for the data, it is necessary to find the minimum or maximum of the objective function through the optimization method, so as to update the model parameters, so that it can effectively fit the input-output relationship of the training data and has good generalization ability in the test data.

CNN back propagation algorithm is based on gradient descent, and the iteration is divided into two steps: forward propagation generates output results and calculation errors and adjusts the weight through back propagation. For sample , the error is where represents the number of types of samples, represents the target value corresponding to the dimension of the sample, and represents the dimension of the output corresponding to the sample. In a fully connected network, the structure of the network can be described as where .

The activation function is generally the Sigmoid function.

The gradient descent method is one of the most popular optimization algorithms, and it is also the main method of the DNN optimization model. The gradient descent method is a first-order optimization algorithm. The optimized objective function parameters iterate along the opposite direction of the gradient according to the specified step distance, and the value of the objective function will gradually decrease, which will be close to the local minimum of the objective function. For the convolution layer, the convolution process of CNN can be expressed as

Each feature map has an offset , as can be seen, but the output feature maps differ due to the different convolution kernels. Convolution of feature graph I of the input layer, for example, yields feature graphs and , but the convolution kernels for these two convolution processes are different. After the introduction of random retirement, the neurons in the network layer are randomly reset in each iteration, reducing the probability that some nodes will appear in each iteration, allowing the update of network layer parameters to be separated from the interdependence between nodes when the incoming samples are calculated, and the appearance of features to be random and without interaction. The BN algorithm is as follows: (1)Normalization:

in formula (4) refers to the average value of neurons in each batch of training data. represents the standard deviation of activation of neurons in each batch of data (2)Transformation and reconstruction:

The sum of and needs to be obtained by training with the BP algorithm. In this way, when and satisfy formulas (6) and (7), the characteristics learned by the original layer can be restored. (3)The forward propagation process of the BN network layer is

The input value is , is the number of batches, is the mean, is the variance, and is the output.

The random retirement model makes the network structure corresponding to each input sample different. For an input sample, even if some nodes are cleared, the weights of these nodes exist, but they have not been updated, that is to say, the weights are shared. In this way, different samples correspond to different models, and the trained system has strong robustness. The basic idea of DL is that each layer is pretrained by unsupervised learning, and the output of the previous layer is used as input to output a new representation of data. After pretraining, fine-tuning is performed with the joint training algorithm.

4. Results Analysis and Discussion

CNN is a DL method inspired by a biological visual cognitive mechanism, which has played an important role in the history of DL and is one of the first depth models with good performance. CNN is an improvement on the traditional neural network, which can be used to process data with similar grid structure.

The essence of DL is that DNN can automatically extract effective feature representation from data and make a decision by fitting complex function mapping according to target tasks. DL needs to involve different technologies at different stages. Back propagation forwards the error from the last layer through back iteration. Obviously, in this error signal diagram, each error signal corresponds to an area of the convolution layer. Compared with the traditional NN back propagation algorithm, CNN has obvious classification performance when there are enough iterations. Figure 3 shows the recognition effect of two kinds of network structures when the number of iterations is from 1 to 200.

It can be seen that the final error rate of CNN is low when the number of iterations is 200, and it has not yet reached convergence, with a downward trend. Relatively speaking, the NN back propagation algorithm converges more slowly and is obviously unstable in the iterative process, and its error rate remains high. Compared with the NN back propagation algorithm, CNN has obvious advantages in recognition rate.

When working with raw big data, preprocessing is required to remove anomalous data, balance the importance of various data, and make network training easier. When designing the network, we must consider the differences in data processing, network structure complexity, target function, and training simplicity. The convergence and rapidity of network function and training must be balanced when designing the objective function. When training, we must consider variables such as the size of the block in iteration and the optimization method’s superparameter. When training results are abnormal, it is important to figure out what is causing the problem and fix it. Figure 4 depicts the training and test errors in this experiment at various iterations.

Figure 5 is a comparison of DNN, CNN, and NN experiments. Compared with the convolutional neural network, DNN has a slight decrease in recognition rate, but the amplitude is not large. Compared with the back propagation algorithm, it has obvious improvement. In contrast, DNN has obvious advantages.

Compared with traditional neural networks, DNN has better improvements in the pretraining and fine-tuning stages. The batch gradient descent method can effectively ensure the optimization of the objective function and take care of all training samples to the same degree. When there are similar sample data in the data set, the batch gradient descent method will cause redundant and avoidable calculations. On the other hand, when the training sample data is particularly large, computing the loss function of all samples and the gradient of the parameters at one time requires huge computing resources and memory space, which is often difficult to meet with current hardware devices. The stochastic gradient descent method, which is used to update the parameters of the fitting function, only randomly selects one sample data at a time to calculate the loss function and the gradient in order to solve these two problems. The goal of random is to ensure that each type of sample has the most impact on the updated parameters during optimization. Overfitting in small samples can be avoided with random retreat. This experiment uses a small number of samples to test the role of random retreat in neural networks. In the experiment, the concept of random retreat is introduced, the experimental results between random retreat and no random retreat are compared, and then the random retreat ratio is changed to find the best random retreat ratio. Figures 68 depict the experimental results.

To calculate the partial derivative, the error signal diagram of this layer should be calculated first. Because the downsampling layer is followed by a convolution layer, it is necessary to make clear which area of the input diagram corresponds to the output pixel. Here, it is necessary to find out which region in the current error signal map corresponds to which given pixel in the next layer error signal map. The best way to generalize the CNN model better is to use more data for training. Although the direction of each update of the random gradient descent method is not necessarily the best, in practice, it can converge to a better local minimum as effectively as the batch gradient descent method. The difference is that, because the random gradient descent method only updates the parameters according to one sample at a time, it is possible to skip the original local minimum in an iteration process and find a better local minimum or even global optimal value.

In view of the fact that DL is mostly suitable for large data sets, an improved DNN structure is proposed for small samples: the whole process of DNN can be divided into two stages: pretraining and parameter fine-tuning, and the improved algorithm downsamples the samples in the pretraining stage. In the parameter fine-tuning stage, random retirement is introduced, a part of the nodes in the hidden layer are randomly cleared, and their weights are not updated. The advantages and disadvantages of this method and other classical algorithms in recognition rate and time are compared and analyzed through experiments. The results show that in small samples, DNN has a good improvement in recognition rate and time-consuming after introducing downsampling and random fading, and the overfitting phenomenon has been effectively alleviated. The superiority of this algorithm is proved.

5. Conclusions

DL is an innovation and progress in the field of computer vision, which brings hope for solving many problems. It is one of the hottest technologies in the twenty-first century. It builds a similar deep hierarchical model structure by simulating the nervous system of the human brain, transfers the original data in turn, and finally transforms into higher-level and more abstract features. In academic circles, studying and analyzing the operation mechanism of deep neural networks, as well as revealing the characteristics learned in neural networks, are a hotspot of research, as well as a requirement for future artificial intelligence development. This paper proposes an adaptive CNN model based on CNN research to overcome the shortcomings of traditional image recognition methods. In this model, the parameters of a single feature map of each layer network are adaptively adjusted by the parameterized linear correction unit, and the image of the convolution layer is subsampled by downsampling, which reduces the amount of data processing while ensuring that the useful information is not lost. A CNN model is constructed that is adaptive. By improving and optimizing the DL model and applying it to the field of ancient large-scale figurine image recognition, the complicated preprocessing process required by traditional algorithms can be avoided, and the development of ancient large-scale figurine image recognition technology has played a positive role.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares no conflicts of interest.

Acknowledgments

This study was supported by the 2020 Start-Up Fund Project for Scientific Research of High-Level Talents, project No. GCRC202007, project name: research on techniques of large-scale heavy pottery in ancient China.