Abstract

Machine learning is the core field of artificial intelligence. It provides a guarantee for the digital intangible cultural heritage management of artificial intelligence. However, the existing theoretical and practical research points out that there are still many gaps in this field. Huaer is a folk song popular in Qinghai, Gansu, Ningxia Hui Autonomous Region, and individual regions of Xinjiang. It is known as the soul of the northwest. It is a national human intangible cultural heritage. It was listed as human intangible cultural heritage by the United Nations in September 2009. With the rapid development of network technology and machine learning, it is very important to manage the network communication and deep mining of Huaer information. In this regard, use of machine learning natural language processing to mine the information of Huaer lyrics is proposed. By constructing the Huaer model of recurrent neural network (RNN), data mining of Huaer lyrics is carried out, and the built-in language module in Python is interconnected with dynamic Web pages. Four Huaer image segmentation methods and five deep learning algorithms are proposed, and the steps of image segmentation algorithm and BP neural network algorithm based on block technology are introduced. The results can provide new ideas for the protection and inheritance of music intangible cultural heritage and provide effective and high-quality information for Huaer art researchers and lovers.

1. Introduction

Artificial intelligence is the greatest scientific and technological innovation in the 21st century [15]. As the core field of artificial intelligence, the goal of machine learning is to let computers learn by themselves. The machine learning algorithm enables it to identify the laws in the observation data, build a model to explain the world, and predict things without clear preprogramming rules and models [69]. Therefore, using machine learning to protect intangible cultural heritage can help people better inherit intangible cultural heritage.

Huaer refers to the traditional folk songs popular in Northwest China such as Gansu, Ningxia, and Qinghai. As a world intangible cultural heritage, it is the most bloody, magnificent, and heroic logo in Chinese folk songs. It is also the most complete and representative model in the Chinese folk song music system. Huaer is born with the people’s working life. The first research has carried out extensive and in-depth research on the musical attributes and cultural forms of Huaer. The research content covers the musical characteristics of Huaer, aesthetic identity, humanistic spirit, inheritance, and protection. The research objects include Gansu Huaer, Xinjiang Huaer, Qinghai Huaer, and Ningxia Huaer. Research perspectives include musicology, anthropology, folklore, and ecology [1013].

With the continuous development of computer technology, foreign and domestic Internet users obtain information more rapidly, and people’s growing spiritual and cultural needs are higher and higher. Huaer folk lovers and Huaer researchers urgently need to obtain effective, comprehensive, and large-scale Huaer information [11]. At present, the world culture is in a period of great development, various ideological and cultural exchanges and blending are more frequent, and the role and status of culture in the competition of comprehensive national strength are more prominent. As a part of the traditional culture of China, the development and dissemination of Huaer are also indispensable at present [13]. In this regard, the author uses the advantages of computer technology to analyze the characteristic culture of Huaer in this region. By studying the development of Huaer, the author comes to the conclusion that the collection of Huaer information on the Internet is very small, and the content is incomplete and difficult to collect. As a unique art in this region, Huaer has great development space and receives resource constraints in the process of inheritance and development. Therefore, the research on Huaer information is more necessary and urgent at present.

Unfortunately, the existing theoretical and practical research points out that there are many gaps in this field. How to identify intangible cultural heritage with the help of machine learning, build a model to explain the characteristics of Huaer music, and then inherit intangible cultural heritage with artificial intelligence is an urgent practical problem in front of the academic interface. From the perspective of machine learning, this study examines and discusses the identification of Huaer, the world intangible cultural heritage.

The current research also discusses the identification of Huaer, but mostly focuses on distinguishing various characteristics of Huaer, such as musical structure, mode and tonality, singing style, and poetic characteristics. On the whole, the research on these characteristics focuses on language description rather than on strategy extraction, so it is not good for the guidance of using artificial intelligence to improve the effect of Huaer recognition. How to adopt machine learning strategies to effectively identify Huaer is one of the few related studies.

Except for a few studies that have developed the emotion recognition system of music audio, most studies focus on audio pitch recognition and note recognition, while there are few empirical studies on Huaer case base based on machine learning, only the regional pattern classification of Chinese folk songs [1417]. Therefore, the domestic research on Huaer recognition needs to be strengthened in terms of the comprehensiveness of research methods, the innovation of research perspectives, and the diversity of research contents.

In terms of machine learning theory, the research results of foreign folk song recognition research in research methods and perspectives, online recognition, and music emotion [18] are far more than those of domestic similar research. In addition, it is rare to analyze and discuss folk song recognition through a deep neural network algorithm at home and abroad. Most of them are fuzzy systems based on sound-level contour and harmony analysis. However, the recognition theoretical system of other scholars still has the deficiency of emphasizing content and neglecting category.

To sum up, in addition to the lack and deficiency of theoretical research, practical and objective factors also increase the difficulty of identifying folk song species. For example, the question of whether Longnan folk songs belong to Huaer in Longzhong or still belong to folk songs is becoming more and more intense in the academic circle, and the relevant empirical research needs to be followed up. There are also deficiencies in the research of traditional music recognition at home and abroad, focusing on case description, interpretation, and strategy refinement, neglecting strategy classification, and combining theoretical system construction. It still needs to be further improved in the systematic and interdisciplinary vision of the research. No one has set foot in the field of introducing machine learning into Huaer recognition research. Therefore, from the perspective of machine learning, this study introduces musicology, computer science, and technology to explore the construction of Huaer resource database, to summarize various algorithms and strategies of Huaer recognition.

3. Basic Concepts

3.1. Recurrent Neural Network

A recurrent neural network (RNN) [1921] is a kind of neural network specially used to process sequences. RNN can be extended to longer sequences, and most RNN can also handle variable length sequences. It shares parameters in different ways. Each output item is a function of the previous item, and each output item is generated by applying the same update rules to the previous output. RNN can be applied to spatial data across two dimensions. When an application involves time data and the whole sequence can be observed before providing the whole sequence data to the network, RNN can have a backward connection about time. The basic structure of the mainstream recurrent neural network model is shown in Figure 1.

The left half of Figure 1 is the basic structure diagram of RNN model not expanded by time (the black box in the figure represents the delay of a single time step), and the right half is the diagram expanded by time. In the structure diagram, each time step t is generally represented as follows:(1)x(t) represents the input of training samples at time step t, and x(t−1) and x(t+1) represent the input of training samples in time steps t − 1 and t + 1, respectively.(2)h(t) represents the activation function of the hidden layer at time step t. h(t) is determined by x(t) and h(t−1). The sigmoid function is used for general dichotomy problem, and softmax function is used for K-category classification problem.(3)o(t) represents the output of the model at time step t. o(t) is only determined by the current hidden state h(t) of the model.(4)L(t) represents the loss function of the model in time step t, and L(t) represents the length of the output value o(t) and the corresponding training target y(t).(5)y(t) represents the target output of the training sample sequence at time step t. The connections from the input layer to the hidden layer, the hidden layer to the output layer, and the hidden layer to the hidden layer are parameterized by weight matrices U, V, and W, respectively.

3.2. Hypertext Preprocessor

Hypertext preprocessor (PHP) is a general purpose programming language originally designed for dynamic Web page development [22, 23]. PHP absorbs the characteristics of C, Java, and Perl languages. The syntax is simple and easy to learn. It is widely used in Internet dynamic Web page development technology and is mainly applicable to the field of Web page development. PHP’s unique syntax combines C, Java, Perl, and PHP’s own syntax. It can execute dynamic Web pages faster than common gateway interface (CGI) or Perl. Compared with other programming languages, PHP embeds the program into hypertext markup language (HTML) for execution, and the execution efficiency is much higher than CGI that completely generates HTML tags [4]. PHP can also execute compiled code, which can encrypt and optimize code operation to make the code run faster.

3.3. Python Built-In Module and Dynamic Web Page Interconnection

Python is an interpretative programming language, so it has the operation mechanism of interpretative language. So far, Python has many advantages over other languages because of its scalability, cross-platform, and other characteristics. Python’s scalability is reflected in its modules, and its powerful class library provides effective help for computer cutting-edge disciplines such as machine learning [24, 25]. The Python modules involved in this experiment include (1) Gensim module applied to automatically extract semantic topics from text, and (2) sys, os, time, json, process, and network module socket in Python standard library.

Gensim is an open-source third-party Python module, which is used in unsupervised learning of the topic vector expression of the text hidden layer from the original unstructured text. It supports a variety of topic model algorithms including term frequency-inverse document frequency (TF-IDF), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and word2vec, supports flow training, and provides application program interface (API) for some common tasks such as similarity calculation and information retrieval. In this experiment, we use word2vec, a tool launched by Google to obtain word vectors. The internal algorithm of the tool realizes the transformation from word to vector through deep learning. The word vector output from word2vec model can be used to do a lot of related work of natural language processing, such as clustering, finding synonyms, part of speech analysis, and prediction.

In this experiment, Python, an open-source project of Google, is used to realize the interconnection between Python and dynamic Web pages. It complements the advantages of the two languages. The combination of Python program and PHP program can be understood as the technology of the combination of Python language and PHP language, and the popular technology can be understood as the mixed programming technology of Python language and PHP language. Python and PHP languages have their own internally defined data types. When PHP data are sent to Python or Python data are sent to PHP, transcoding is required in traditional technology, while Python technology can directly send data by serializing different data types of Python and PHP without transcoding, which greatly improves the development speed.

Because of its global interpreter lock (GIL) feature, Python language has low multithreading efficiency. In Python based on the mixed programming mechanism of Python program and PHP program, the Python end can be deployed in a multi-process manner, to improve the overall working efficiency of the Python program. This technology improves the multithreading efficiency of Python. The basic principle of Python technology is socket communication, so network module socket support is required. Socket is a basic component in network programming. Socket is basically an information channel, and there are programs at both ends [8]. These programs may be located on different computers (connected through the network) and send information to each other through socket. The main principle of network communication in Python is shown in Figure 2.

4. Construction of the Huaer Model Based on RNN

4.1. Python Crawler Building

The construction of cyclic neural network model has high requirements for the amount of data, so the Huaer lyric information used in this experiment is obtained by Python Web crawler technology. Using crawler technology, we can quickly and accurately obtain Huaer lyric information from Web applications and provide data support for subsequent experiments. Python Web crawlers are built to simulate computer network connections. That is, the computer makes a request to the server (with request header and message body), and the server responds to the computer’s request (with HTML file). The crawler simulates the computer to initiate a request to the server, accepts the server’s response content, and parses and extracts the required information.

By analyzing the Huaer information source, we found that the existing Huaer information is mainly distributed in Web application NetEase cloud music and QQ music. According to the structure of Web pages on different platforms, the crawler program is designed accordingly. Through analysis, the NetEase cloud music Web page structure is a multipage Web page structure. The process of this type of web crawler is as follows:(1)Manually turn the page, observe the universal resource locator (URL) composition characteristics of a Web page, construct the URLs of all pages, and store them in the list(2)Loop out the URLs according to the URL list(3)Define crawler functions(4)Call the crawler function in a loop to store data(5)After the cycle, end the crawler

The flow chart of multipage crawler is shown in Figure 3.

The QQ music Web page structure is a cross-page Web page structure, and the cross-page crawler process is as follows:(1)Define the crawl function to crawl the URLs of all albums on the list page(2)Save the album URL to the list(3)Define the function to crawl detailed page data(4)Enter the album detailed page and crawl the detailed page data(5)Store data, end the cycle, and end the crawler program

The flow chart of cross-page crawler is shown in Figure 4.

All the Huaer lyric data obtained in this experiment are saved in the text file for subsequent processing.

4.2. Lyric Splitting

Compared with English, English takes space as a very obvious separator, and an English word can be divided horizontally by letters. However, Chinese has no separator between words because it inherits the tradition of ancient Chinese and is divided into 8 kinds according to the “eight methods of Chinese character Yong,” namely point, horizontal, vertical, apostrophe, pressing, bending, bending, and hook. In ancient Chinese, except for people’s names, place names, and continuous words, words were usually single Chinese characters, so there was no need to write word segmentation at that time. In modern Chinese, there are many double-word or multi-word words, and a word is no longer equivalent to a word. Therefore, it brings difficulty to Chinese word segmentation. To get more effective experimental data and more rigorous analysis of experimental results, Jieba, the mainstream word segmentation tool, is used for Huaer lyrics. We use the word segmentation tool Jieba to denoise the crawled Huaer lyrics and eliminate the special characters. After word segmentation, we can get the prediction database that can be used for training and then train the RNN model.

4.3. System Modeling

The larger the corpus, the better the training results of the model, while for the smaller corpus, the opposite is true. The model training requires the Python NLP Gensim package. Gensim needs to be installed first, but Gensim has requirements for the versions of science and technology libraries such as NumPy and SciPy. It needs to pay attention to the versions of NumPy and SciPy. When the algorithm program does not make an error during import, it will succeed. In Gensim, the parameters related to the training algorithm are in Gensim.models.word2vec.Word2vec. The parameters to be noted are as follows:(1)sentences: this parameter sets the corpus to be analyzed, which can be sequence and character files. In this experiment, file traversal is used to read.(2)size: this parameter sets the dimension of word vector. The default value is 100. The value of this dimension is generally related to the size of the corpus currently used. If the corpus is very small, such as text corpus less than 100 m, the default value is used. If the corpus is large, the dimension is increased.(3)window: this parameter is the maximum distance from the word vector context. The larger the window is, the words farther away from a word will also have a context relationship. The default value is 5. In practical use, the size of this parameter can be dynamically adjusted according to the actual needs. If it is a small language material, this value can be set smaller.(4)sg: this parameter is the selection of two models of word2vec. If it is 1, it is the skip-gram model. If it is 0, it is the continuous bag of word (CBOW) model. The default parameter value is 0.(5)hs: this parameter is the choice of two solutions for word2vec. If the parameter value is 1 and the number of negative samples is greater than 0, it is hierarchical softmax. If the parameter value is 0, it is negative sampling. The default parameter value is 0.(6)negative: this parameter is the number of negative samples when using negative sampling. The default is 5. It is recommended to be between [3, 10].(7)cbow_mean: this parameter is used for CBOW projection. If it is 1, it is the average value of the word vector of the context. If it is 0, xw in the algorithm is the sum of the word vectors of the context. In this study, the average value is used to represent xw, and the default value is also 1. It is not recommended to modify the default value.(8)min_count: this parameter is the minimum word frequency of the word vector to be calculated. This parameter is added to remove rare low-frequency words. The default is 5. If the corpus is too small, this value is lowered.(9)iter: this parameter is the maximum number of iterations in the random gradient descent method. The default value is 5. For smaller corpora, you can reduce the parameter value, and for larger corpora, you can increase the parameter value.(10)alpha: this parameter is the initial step of iteration in the random gradient descent method. The default is 0.025.(11)min_alpha: because the algorithm supports gradually reducing step size in the iterative process, min_alpha gives the minimum iteration step value.

Invoking Word2vec.Word2vec() algorithm is enough for the training of the model. For corpora of different sizes, the algorithm parameters need to be adjusted to achieve better training results. When the model training is completed, saving the model for reuse is needed. In word2vec, there are two ways to save the model. One is to save the model directly, and the other is to store it in a form that can be parsed by C language. For this, it can be saved according to needs. In addition, the speed of model training is affected by the running environment of the training program and the size of the corpus. When the corpus is very large, the computer with better performance can train the model faster.

4.4. Identification Strategy

Machine learning recognition strategy is a set of systematic and operable Huaer recognition algorithms, which can provide theoretical support and practical guidance for Huaer recognition model and retrieval field. The folk song category artificial intelligence (AI) is developed according to the recognition strategy, which can accurately distinguish the categories of folk songs according to the model. Before recognizing the characteristics of Huaer music, it is necessary to establish a certain scale database and sample size.

4.4.1. Huaer Database Construction

A database is a warehouse that organizes, stores, and manages data according to data structure. It is a collection of organized, shareable, and uniformly managed large amounts of data stored in computers for a long time. Huaer database is a collection of Huaer music resource data that are stored together in a certain way, can be shared with multiple users, has as little redundancy as possible, and is independent of the application. It can be regarded as an electronic file cabinet—the place where electronic files are stored. Users can add, query, update, delete, and other operations on Huaer music data in the file.

As Huaer is popular in Gansu, Qinghai, and Ningxia provinces (regions) in the northwest, it is a popular folk song among local Han, Hui, Tibetan, Dongxiang, Baoan, Sala, Tu, Yugu, and other nationalities. It covers a wide range of areas and objects, so it is difficult to collect materials. Therefore, the construction of Huaer database can apply the method of typical example extraction to collect representative Huaer music orders such as Gansu Huaer, Qinghai Huaer, and Ningxia Huaer. The collected genres and forms include Huaer’s literary theory, pictures, music scores, and audio and video resources and focus on the audio data of all kinds of music cards sung by Northwest Huaer masters (Zhonglu Zhu in Gansu, Jun Ma in Qinghai, Dexian Wang in Ningxia, etc.). After the collection, the resources are classified, coded, labeled, and analyzed to realize the real-time dynamic update and expansion of the resource database. After completion, it will be uploaded to the existing local official social networking website platform to provide online sharing services for the whole society free of charge. The database is shown in Figure 5.

4.4.2. Huaer Recognition Path

Recognition refers to the development of a human-level recognition system based on in-depth learning. Machine learning is a computer algorithm that predicts the image (music score), sound (tone), and other data of Huaer through multilayer nonlinear feature learning and hierarchical feature extraction. Its core is neural network algorithm.

(1) Image Recognition of the Huaer Music Score. According to the writing order, block and line features of Huaer notation and staff notation, image segmentation methods such as DFS-based image marker segmentation method, bounding box algorithm, adaptive iterative projection smoothing algorithm, and score image segmentation method based on cluster analysis are proposed, to lay a foundation for folk song information classification and recognition.

An image segmentation algorithm based on block technology is proposed in this study, and the main steps of image segmentation algorithm to extract Huaer score image are introduced as follows.

Firstly, a music score image is divided into independent small images, and each small image is divided initially; then, the initial division of each small image is fused, and then, a point is randomly selected in each division of the fusion, and the selected points constitute the characteristic data of the original score image; finally, the spectral clustering method is used to cluster these feature data, and the category of each pixel in the original image is determined according to the clustering results of the feature data. The proposed algorithm is mainly composed of image segmentation, initial pre-segmentation, clustering of feature data, and determining the category of pixels in the original image.(1)Image segmentationThe purpose of image segmentation is to transfer a pair of large music score images into memory for processing, enhance the applicability of the algorithm, and avoid the problem that too large music score images cannot be processed. After the large-scale score image is divided into blocks, the amount of data and calculation pressure of each operation are reduced.(2)Initial presegmentationEach independent small image after blocking is initially divided, and then, one division from the initial division of each small image is selected as the reference. Without losing generality, the initial division of the first small image is taken as the reference, and the division of the remaining small images is compared with the reference division: if the division of the small image has the same grayscale as the reference division, then the pixels corresponding to the division of the small image are classified into the reference division; if the category of pixels in the small image division does not appear in the reference division, a new division is generated and added to the reference division. Through this method, an initial division of the whole image will be formed as follows:(3)Clustering of characteristic dataA pixel i is randomly selected from each , where , and the gray value of the original music score image where the point is located is taken as an input object in the spectrum clustering. Assume that the data (called feature data) obtained by this method are as follows: The feature data are clustered using the spectral clustering algorithm, and the clustering result as is set, where each element in is the class mark of the input vector , and the vector composed of these class marks is as follows: where , in which k is the number of categories to be divided.(4)Determination of the category of pixels in the original music score imageIt is assumed that the category of is and the division of is . All pixels in are classified into the same category as , to obtain the segmentation result of the original score image.

(2) Feature Recognition of Huaer Music. There are Bayes, nearest neighbor method, BP neural network, decision tree, and support vector machine algorithms to support Huaer music feature recognition [2628]. Four recognition modules, audio processing, feature extraction, pattern classification, and semantic representation, are proposed according to the pitch, melody, rhythm, tone, and beat of Huaer melody. BP neural network method [2933] is proposed for feature recognition of Huaer music in this study, and the main steps of extracting Huaer music features by BP neural network algorithm are introduced as follows.

BP neural network is a multilayer feedforward neural network. The main characteristics of the network are signal forward transmission and error back propagation. In forward transmission, the input signal is processed layer by layer from the input layer through the hidden layer to the output layer. The neuron state of each layer only affects the neuron state of the next layer. If the output layer cannot get the expected output, it will turn to back propagation and adjust the network weight and threshold according to the prediction error, so as to make the prediction output of BP neural network approach the expected output continuously. The topology of BP neural network can be regarded as a nonlinear function. The network input value and prediction value are the independent variable and dependent variable of the function, respectively.

When the number of input nodes is n and the number of output nodes is m, BP neural network expresses the functional mapping relationship from n independent variables to m dependent variables. Before BP neural network prediction, the network must be trained first. Through training, the network has the ability of associative memory and prediction. The training process of BP neural network includes the following steps:Step 1: network initialization.According to the system input and output sequence, the number of network input layer nodes, hidden layer nodes, and output layer nodes is determined. The connection weights between input layer, hidden layer, and output layer neurons are initialized. Then, the threshold of initialization hidden layer and the threshold of output layer are initialized. The learning rate and neuron excitation function are given.Step 2: hidden layer output calculation.According to the connection weight between input layer and hidden layer and the hidden layer threshold, the hidden layer output is calculated.Step 3: output layer output calculation.According to the hidden layer output, the connection weight, and the threshold, the prediction output o of BP neural network is calculated.Step 4: error calculation.The network prediction error is calculated according to the network prediction output and the expected output.Step 5: weight update.The network connection weight is updated according to the network prediction error.Step 6: threshold update.The network node threshold is updated according to the network prediction error.Step 7: whether the algorithm iteration is over is judged. If not, return to Step 2.

5. Case Study

5.1. Mining Results of Huaer Lyrics

The training of this experimental model belongs to unsupervised learning. There are not many objective evaluation methods similar to supervised learning, and it depends more on end-to-end application. RNN is used to process Huaer’s information to realize clustering, synonym finding, part of speech analysis, and prediction of Huaer’s lyrics. Word2vec language model has high performance, but it has high requirements for the amount of data. The experimental results are as shown in Tables 13.

5.2. Analysis of Mining Results

In this experiment, the circular neural network is used, which is an excellent neural network designed to comprehensively use the forward information and reverse information in the historical information. In terms of data collection, due to the limitations of the historical attribute and cultural background of Huaer, the information collected by the author is limited. In addition, the neural network has high requirements for the size of the corpus, and the correctness of the training results needs to be improved. It is more ideal to use speech recognition technology to convert the existing Huaer video and audio speech signals into corresponding text or commands to obtain a large amount of Huaer data.

As the traditional culture of Huaer, the key content of Huaer's lyrics is extracted to form a word cloud. After sorting the words, only the first 300 high-frequency words are intercepted due to too many words. The word cloud is shown in Figure 6.

By observing the word cloud picture, we can see that the word “Huaer” (the center of Figure 6) has the highest word frequency, and other words are successively expanded from the origin and distributed at each point according to the word frequency.

6. Conclusions

The 21st century is an era of intelligence. The research results of artificial intelligence have entered many fields such as science and engineering, humanities, and social sciences, as well as intangible cultural heritage protection and people’s daily life. With the rapid development of science and technology, a large number of Huaer samples are identified by means of machine learning, which not only provides a strong sample library and gene library for the survival of Huaer as intangible cultural heritage, but also provides fertile soil for subsequent intelligent machine creation and inheritance of Huaer. In the face of the realistic dilemma of lack of successors and generation division in inheriting music intangible cultural heritage, this research may provide new ideas for the protection and inheritance of music intangible cultural heritage.

In this study, deep learning natural language processing is applied to Huaer information mining through cyclic neural network sequence modeling. Many interesting features are found in the experiment. The principle of analyzing and mining Huaer text information can also be applied to many aspects. The development of artificial intelligence will bring us more opportunities and challenges. Reasonable and effective use of computer technology can bring convenience to all aspects of human beings, such as speech recognition, deoxyribonucleic acid (DNA) sequence analysis, emotion classification, machine translation, and named body recognition. In the experiment, there are some difficulties in the process of Huaer information collection. It has certain limitations for practical applications, but the ideas provided have laid a theoretical foundation for subsequent practical application research. A larger corpus and optimizing the RNN internal network architecture can more effectively analyze Huaer information. How to obtain a larger data corpus with the help of speech recognition and video behavior recognition technology and optimize the internal network architecture is the next research problem.

Data Availability

The dataset can be accessed upon request to the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest.