Scientific Programming for Fuzzy System Modeling of Complex Industry DataView this Special Issue
Analysis of Poetry Style Based on Text Classification Algorithm
In order to realize the effect of intelligent poetry style analysis, this paper applies the text classification algorithm to the poetry style analysis, combines the knowledge representation algorithm to perform text classification and recognition, improves the algorithm, and applies it to the poetry style analysis model. Moreover, this paper combines intelligent algorithms to construct a poetry style analysis system, constructs the system’s functional modules, preprocesses the poetry documents in the corpus, and maps them to the vector space that can be directly processed by the computer. In addition, after constructing the system model, this paper verifies the poetry style analysis system based on the text classification algorithm through simulation experiments. From the research results, the effect of the poetry style analysis method based on the text classification algorithm proposed in this paper is very good, which meets the actual needs of poetry style analysis.
At present, a lot of progress has been made in natural language processing technology. As a branch of natural language processing, computer processing of literary language is now in front of academia. With the support of the National Natural Science Foundation of China, this paper is trying to make preliminary explorations in this field. The focus of this paper is how to use machines to realize the style of poetry. As one of the essences of Chinese traditional culture, the classical poems have meaningful meaning, simple and deep sentiment, and their style is intoxicating. However, for a long time, the judgment of the style of poetry works is mostly based on the reader’s judgment and cognition of the style based on experience and feeling. Moreover, traditionally, there are no clear quantitative rules and no formal rules. For machines, it can only rely on the text content of the poem to determine the style. From the form, the poem is also a paragraph of text, and the judgment of its style can also be regarded as the classification of the text style. Therefore, this problem is essentially transformed into a text classification problem in machine learning. That is, under a given style classification system, the machine automatically learns the rules according to the content of the text, and establishes a classification discriminator according to the rules, and the classification discriminator automatically determines the style of the text. Based on this, this paper proposes a model framework for computing the style of Chinese classical poetry based on machine learning. This research is of great significance for expanding the application field of machine learning, for the informatization and excavation of traditional poetry, and for the language understanding of poetry and the computer-aided research of various literary works.
As one of the treasures in the history of Chinese literature, poetry has been passed down through the ages because of its enduring charm. This literary form, favored by the general public and literati and poojas, has been growing, evolving, and disseminating and has reached the peak of its development in the Tang and Song dynasties. The research on Tang poetry has always been a research hotspot among scholars and literary lovers. Since ancient times, countless experts and scholars have carried on extensive and in-depth research on Tang poetry and poetry from a literary perspective. However, with the continuous development of information science and technology, people have begun to try to use information science and technology to deal with some of the language problems encountered in work and life. Natural linguistics has developed rapidly in recent years. Faced with the victories of natural language processing in modern Chinese, some natural language processing methods are applied to ancient poetry. At the same time, in view of the difference between ancient poetry and modern Chinese, from the perspective of computer, integrated data mining, genetic algorithm, and other technologies and resources such as HowNet, it focuses on the application of text classification technology in the classification of poetry style.
In the process of creating poems, the different talents, temperaments, literary accomplishments, life experience, ideological realm, and ideological feelings of the literati and the ink guest will affect the content and expression of the poems written by the literati and ink guest, and they will also have the ability to Poems with different characteristics, therefore, do not have the same style of poems. However, in the process of appreciating classical poems, if you blindly pay attention to the individual differences between poems and ignore the common characteristics of the poems, it is likely to cause the problem of nihilism.
This article combines the intelligent computer algorithm to apply the text classification algorithm to the poetry style analysis and builds an intelligent classification system, which lays the foundation for the intelligent classification of poetry.
2. Related Work
Using computer technology to assist in the study of the field of classical poetry literature, some scholars have begun preliminary attempts and explorations early on. The construction of the classical poetry database and cultural knowledge base is in the computer research laboratories of some universities and research organizations . With the increasing progress of natural language processing technology in the study of Chinese language, it can even gradually support the analysis of rhythm, emotional element, and style characteristics of classical poetry . “Computational Poetics” is proposed in this context . There have been many research results on the related fields of poetry style. According to the literature  “The topic model-based corpus construction and computer-aided creation research,” the topic model based on LDA uses the method of reference word recommendation to study and analyze the word characteristics in poetry, vocabulary semantic analysis, and style feature analysis. Literature  uses the vector space model (VSM) to represent the text of poems and proposes two types of classification models of classical poems, bold and graceful and graceful. The classification models are based on machine learning and natural language processing. Poetry styles are classified, and a good judgment effect has been achieved. According to literature  after segmenting the poems according to the poetry metric and statistical word segmentation methods, they established a poetry vocabulary database and used machine learning methods to analyze the characteristics of poetry styles, classification evaluation, and sentiment analysis. In addition, research in related fields has focused on the algorithm improvement of the poetry style machine classification model, in order to gradually improve the accuracy of the classification model for the poetry style evaluation. The early stage corpus construction and computer processing procedures have been very clear. Various research work is still in-depth . It can be seen that there have been quite a lot of research studies on the classification and judgment of poetry style, which has accumulated rich experience for the classification and judgment of poetry style based on machine learning in this article.
The handling of poems can actually be regarded as a text handling problem with a special content in form . Literature  developed a “Poetry Computer Aided Research System,” which takes the vocabulary in the poem as the main research unit to realize word retrieval, word frequency statistics, and image indexing. Literature  proposed a natural language processing technology based on word connection and used it for the understanding of poetry language. It successfully carried out a test of poetry vocabulary material labeling, a preliminary analysis test of poetry language, and an evaluation test of poem language boldness and graceful style. Literature  combined naive Bayes with the genetic algorithm, and put forward a calculation model for judging the bold and graceful style of classical poetry based on each character of the poem as the smallest unit, and tested the corpus in the classic poetry. Good results have been achieved. Literature  introduced Bayesian classification technology into poetry research from the subject point of view and achieved good experimental results. Literature  proposed a classification method of poem topics based on point mutual information and LDA and achieved a good classification effect on the relationship between poem topics and the evolution of the times. Literature  uses the vector space model (VSM) to convert the poetry text into a vector and selects the word feature through the chi-square test. Finally, the text classifier is constructed based on the naive Bayes and support vector machine algorithm, and the average classification accuracy rate reaches 74.7%. With the rise of deep learning, the literature  applied the RNN neural network to the generation of ancient Chinese poetry, and used the entire historical corpus of poetry as training corpus, adding certain constraints between the words and lines of the generated poetry to achieve it has a better effect than the traditional poetry generation system. Literature  uses the RNN model to generate poetry quatrains, and successfully adds an attention mechanism, so that the poetry can learn semantics, structure, rhyme, and other information at the same time. Literature  based on the RNN model, by optimizing the word vector of poetry, introducing attention mechanism and hybrid training, a model that can produce topic-related poems based on keywords has been obtained, and good results have been achieved.
3. Knowledge Representation of Intelligent Poetry Text Generation System
Like many other intelligent systems, knowledge representation is a very important key decision during system construction. We treat Chinese characters and works of art as images in a parametric form. Mathematically, the “image” of Calligraphy C is represented as a band-shaped image area covered by a series of ellipses. It means that this series of C is composed of ellipses , . , , and are used as the center, semimajor axis, and segment semiaxis of ellipse , respectively. We define the text of traditional Chinese poetry as
The above knowledge representation method is inspired by the method of the paper. In this paper, a band-shaped area is defined as an ellipse moving along a predefined curve.
We now turn to the concept of equivalence and use a formalized definition of the hierarchical representation of the text of poetry. If R is an equivalence relation defined in the field, that is to say, R is as follows :(1)Reflexive,(2)symmetrical, and(3)passed.
Domain A can be decomposed into a series of subsets in relation R, which satisfy the following properties:(a)If , then and(b).
Under the equivalence relation R, if , we say that is equivalent to , which can be expressed as .
In order to establish the hierarchical representation of poetry text, we introduced four equivalence relations : : all the structural ellipses used to form the same original sign are equivalent to each other. : all the original strokes used to form the same combination sign are equivalent to each other. : all the combined strokes used to form the same radical are equivalent to each other. : all the radicals used to form the same Chinese character are equivalent to each other.
We assume that the constructed ellipse is decomposed into equivalence classes under the equivalence relation , which can be expressed as . The num1 original signs are decomposed into equivalence classes under the equivalence relation , which can be expressed as . combined strokes are decomposed into num3 equivalence classes under equivalence relation , which can be expressed as . The num3 radicals are decomposed into equivalence classes under the equivalence relation , which can be expressed as .
In other words, in a specific poetic text work C, there are a total of original strokes . Moreover, C contains combined strokes . We can say that C has num3 radicals . We can also say that C consists of words . Obviously, when we use the granularity of a single word to apply comprehensive reasoning, .
Now, the hierarchical representation of traditional poetry text works can be formally described as follows:where the number of elements in the set M is . Therefore, the following hierarchical relationship expressed in poetry text works can be expressed as follows:
This hierarchical representation describes how the poetry text works at the lowest level are composed of structural ellipses. Each higher level describes how to generate a representation of its current level from the information of the lower level. In essence, it is a tree-shaped knowledge representation.
At the 0th level of the hierarchical representation, the poetry text works are regarded as a series of ellipses, denoted as . These ellipses are called structural ellipses for poetry text works. For each construction ellipse, it is denoted as , and is taken as its center, and ai and bi are taken as its semimajor axis and semiminor axis, respectively. Then, the image of the poetic text work C will be presented as a series of image areas covered by an ellipse.
If a lower-level element cannot be merged with other elements at the same level, it will be promoted to an adjacent higher-level element. In the same way, it is also possible that a radical will be downgraded into a combined gesture, or even an original one. Through these upgrade and downgrade arrangements, we have obtained a unified 6-level hierarchical system, which will be used in the generation of poetry text.
We denote the first structural ellipse of the poem text work as , and its corresponding topological structure is . contains topology information, and through it, can be used to form .
We denote the bounding box of the image area occupied by element as . Here, are the coordinates of the lower left corner of the bounding box and its height and width, respectively. All coordinates are in the world coordinate system. Therefore, we haveHere, is the scaling and transition matrix, respectively. The strict mathematical definition of matrix is as
If m must be a combination of one or more construction elements one level lower. We call the latter substructural elements. All the information needed to combine is stored in , which is the topology constructor of .
Through the topology constructor, we can build a one-to-one mapping of different levels of hierarchical representation. In other words, any hierarchical representation pixel of the s-th structural element belonging to the k-th level can be uniquely mapped to the hierarchical representation pixel of the c-th structural element of the first level. Among them, .
If we introduce the operator ,
Then, there will be a relationship:
The s-th structural element of the k-th level of the hierarchical system is denoted as , and its parameter is denoted as . We now introduce a new matrix operator , which can be used to generate an -dimensional matrix and n-dimensional matrices . If , then we have
We further define a matrix operator
Then, the hierarchy of poetry text works can be formally expressed as follows:
At this time, the operator is defined as follows:
Among them, is the number of columns of matrix , and is an -dimensional zero matrix.
In the above equation, each represents an area covered by the constructed ellipse, and + . This ellipse is the normalized version of the original ellipse for its bounding box. That is, if and is assumed, we have
For an element higher than the 0th level, such as , for all , it is composed of adjacent lower-level elements: . The corresponding shape matrix is deduced, which is obtained from the matrix by linking columns by column. Because the parameterized representation of constructing an ellipse is a -dimensional matrix, the above-mentioned recommended connection method will accurately generate a matrix with only 4 rows at a higher level. Each row of the generated matrix is called the parameterized representation domain of the element. The different domains of an element can be inferred independently.
The weight associated with the s-th field of the i-th source knowledge on the k-th level; that is, the weight of the i-th training example is denoted as . is the reasoning strength of in synthetic reasoning. Therefore, the set of similar reasoning strengths arranged in order constitutes the “view sequence” of the current comprehensive reasoning process. This is what we said that different sequences of showing training examples will lead to different results of poetry text.
For the comprehensive reasoning process generated by novel poetry text works, we assume that n comprehensive source knowledge at the k-th level participates in this reasoning process. Moreover, each comprehensive source knowledge consists of m components, where . Therefore, we can use an -dimensioned matrix to represent a single comprehensive reasoning source and use an -dimensioned matrix to represent all active sources in the current comprehensive reasoning. If we apply synthetic reasoning at the single-character level, then . The t-th attribute of source knowledge is denoted as , which has an associated strength in the process of comprehensive reasoning, and they have the following relationship:
In the intelligent poetry text generation system, this reasoning intensity can be adjusted through a graphical user interface. The new knowledge generated in the synthetic reasoning process is denoted as , and its matrix form parameterization is denoted as . To apply the principle of comprehensive reasoning process, all source knowledge in reasoning must have equal dimensions. That is to say, if we apply synthetic reasoning on source knowledge and , their corresponding matrix form parameterized representations and must satisfy the relationship . This equal requirement is a soft constraint of the synthetic reasoning process. If it is violated, we can introduce a source knowledge equality operator to relax this soft constraint. This is very similar to the mapping problem in similar reasoning. Next, we will introduce a few symbols and define in a strict mathematical way.
We first denote , where
We can also obtain a discrete curve composed of the centers of all the covering ellipses for each structural element in the intelligent poetry text generation system, which can be expressed as . Its corresponding parameterization is expressed as . Therefore, a dispersion curve acquisition method can be described as follows:
Then, an overview of the key columns of a matrix can be introduced and defined. If a curve has key points and its coordinates are respectively, then the key column of can be identified as . Aiming at the key point extraction of a given plane curve , we use the algorithm introduced in this paper. Moreover, more refined key point extraction algorithms are introduced in, but they may cause more serious computational overhead.
We assume that now we have identified key columns, which are the -th matrix column, through the parameterized representation of a matrix form of a comprehensive reasoning source , where . Therefore, we can define the capacity equality operator of the comprehensive reasoning source as follows:
 is an integer trimming operator. Specifically, if each column of matrix is considered to be a key column, we can use a simpler mapping to define the operator
Among them, , and  is an integer trim function. Based on the definition of , we can further define a comprehensive inference source knowledge superposition operator
Through the operator , we can obtain the matrix representation of the superimposed comprehensive reasoning source:
With this operator, we can further calculate the feature matrix of all comprehensive inference source knowledge:
In the above equation, b, , nor are the matrix representations of standard source knowledge at the k-th level.
According to the user input intensity for different comprehensive reasoning sources, we can calculate the comprehensive reasoning viewpoint matrix:
Among them, is a dimensional unit matrix.
Now, we can generate a comprehensive reasoning feature result through the comprehensive reasoning process described by the following form:
In the above equation, is a comprehensive reasoning simulation operator. If the synthetic reasoning process models human creative thinking as a linear process, then , where is an ordinary matrix multiplication operator. If the comprehensive reasoning process models human creative thinking as a z-order polygon, then will be defined as follows:
We can also model creative thinking as a geometric average process, as shown in the equation of (25):
Generally speaking, we can use the comprehensive reasoning model to overload the comprehensive reasoning simulation operator to achieve the purpose of simulating different types of creative thinking.
Finally, by adding the shape of , which is the standard structural element related to the inference result in the process of comprehensive inference, we obtain the parameterized representation of , where is the matrix form of the parameterized representation of the shape of .
The definition of operator can be overloaded to realize different novel creative thinking activities in intelligent systems. Some simple synthetic reasoning simulation operators for topology constructors are as follows:(1)The arithmetic mean is(2)The geometric mean is(3)The harmonic mean is
The integrated reasoning process proposed in this paper is essentially either a value process or an extrapolation process. In other words, is the interpolation weight or extrapolation weight of , and the constraint needs to be satisfied here.
4. Analysis of Poetry Style Based on Text Classification Algorithm
In the research of this paper, the judgment of poetry style based on machine learning, that is, the flow chart of the pattern recognition of poetry text, is shown in Figure 1.
The development and design of the system are shown in Figure 2 according to the research and processing sequence.
Judging the style of poetry is similar to the general process of text classification. First, the algorithm needs to preprocess the poetry documents in the corpus and then map them to the vector space that can be directly processed by the computer. The algorithm selects the poem documents that have been styled in advance as the training corpus and the test corpus. Subsequently, the algorithm uses machine learning methods to generate model data for style classification. Finally, the algorithm builds a machine judging tool for poetry style on the basis of the model data tested on the test corpus. This tool can judge the style of other poetry documents in the corpus. The flow chart of style evaluation is shown in Figure 3.
Poems are composed of text content and belong to unstructured data. It is difficult to analyze them using quantitative analysis methods. It is necessary to introduce data indicators that can be measured to perform data analysis and processing. The commonly used method is to segment the text content and then count the word frequency data. For high-frequency words such as “Wanli,” “Jiangshan,” “Dongfeng,” and “Twilight” that often appear in poems, they can generally express the emotional bias and style characteristics of the poet’s choice of poetry imagery. In order to classify and aggregate the genres of poets and poets, this paper mainly uses the method of high-frequency word statistics to classify and aggregate the style characteristics of the poets. The cluster analysis of word style is shown in Figure 4.
In order to obtain word vectors with more efficient algorithms, Word2Vec can efficiently train on millions of word lists and hundreds of millions of datasets. The obtained training result-word vector can well measure the semantic relevance and similarity between words. Word2Vec includes two network models: CBOW (Continuous Bag-of-Words) model and Skip-Gram model. The network structure is shown in Figure 5. Skip-Gram can be seen as the reverse process of CBOW. The essence of the CBOW model is a feed forward neural network language model (Feed Forward NNLM).
RNN is a neural network designed specifically for serialization problems, and its basic network structure is shown in Figure 6.
This article summarizes the sequence learning methods of the above two statistical language models. The framework of the joint language response model based on statistical learning is shown in Figure 7.
In the research of this paper, Brill’s conversion learning method is used to cope with the generated context-sensitive rules in conjunction with the language. The flowchart of the learning algorithm is shown in Figure 8.
The text visualization technology is mainly composed of three modules: text analysis and processing, visual design presentation, and interactive design, as shown in Figure 9. First, the algorithm collects text information, preprocesses data and expresses knowledge, and transforms it into a corresponding structured data format. Then, the algorithm visually presents the information with graphics and color elements through visual coding. In the visual presentation, the algorithm should provide users with effective and reasonable visual presentation and interactive operations based on the user’s cognitive characteristics. At the same time, the algorithm performs data operations on the visualized information elements through a reasonable interactive method to facilitate users to quickly and clearly understand the displayed information content, thereby realizing a complete visual design implementation process.
After constructing the above model, the effect of the model is verified, and the system function is verified based on actual needs. Moreover, this paper combines the simulation test to verify the effect of the poetry text classification and the poetry style verification of the system of this paper. The poetry text classification is shown in Table 1 and Figure 10.
From the above research, it can be seen that the poetry style analysis method based on the text classification algorithm proposed in this paper has a good classification effect. The experimental results of the poetry style analysis are shown in Table 2 and Figure 11.
From the above research, the effect of the poetry style analysis method based on the text classification algorithm proposed in this paper is very good, which meets the actual needs of poetry style analysis.
Before the classification of poetry text begins, the algorithm needs to represent the unstructured document data into a data form that the computer can understand. This requires the algorithm to perform corresponding preprocessing and feature representation of the document first and convert the unstructured or semistructured data form into a structured data form that can be processed by the computer. This includes a series of processes such as word segmentation, filtering of low-frequency words and forbidden words, feature representation, and feature extraction. The so-called word segmentation is to add a separator between each entry in a Chinese document to convert the continuous character stream form of the Chinese document into a discrete word stream form. At present, the main word segmentation methods used are forward and reverse maximum matching method, word-by-word traversal method, best matching method, and word frequency statistics method. In addition, there are two-scan method, adjacency constraint method, and so on. This paper applies the text classification algorithm to the analysis of the style of poetry and constructs an intelligent classification system. From the research results, it can be seen that the poetry style analysis method based on the text classification algorithm proposed in this paper is very effective and meets the actual needs of poetry style analysis.
The labeled dataset used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The author declares no conflicts of interest.
This study was sponsored by Henan Industry and Trade Vocational College.
M. H. Aghdam and S. Heidari, “Feature selection using particle swarm optimization IN text categorization,” Journal of Artificial Intelligence and Soft Computing Research, vol. 5, no. 4, pp. 38–43, 2015.View at: Publisher Site | Google Scholar
S. T. Gissel, “Scaffolding students’ independent decoding of unfamiliar text with a prototype of an eBook-feature,” Journal of Information Technology Education: Research, vol. 14, pp. 439–470, 2015.View at: Publisher Site | Google Scholar
R. R. Kamble and D. V. Kodavade, “Relevance feature search for text mining using FClustering algorithm,” International Journal of Computer Science and Engineering, vol. 6, no. 7, pp. 223–227, 2018.View at: Publisher Site | Google Scholar
M. Pandi and V. D. K. Rajendran, “Efficient feature extraction for text mining,” Advances in Natural & Applied Sciences, vol. 10, no. 4, pp. 64–73, 2016.View at: Google Scholar
M. Mojaveriyan, H. Ebrahimpourkomleh, and S. J. Mousavirad, “IGICA: a hybrid feature selection approach in text categorization,” International Journal of Intelligent Systems Technologies and Applications, vol. 8, no. 3, pp. 42–47, 2016.View at: Publisher Site | Google Scholar
M. Nayak and A. K. Nayak, “Odia running text recognition using moment-based feature extraction and mean distance classification technique,” Advances in Intelligent Systems & Computing, vol. 309, no. 3, pp. 497–506, 2015.View at: Publisher Site | Google Scholar
E. Ramalakshmi and S. Golla, “An advanced fuzzy constructing algorithm for feature discovery in text mining,” International Journal of Computers and Applications, vol. 127, no. 17, pp. 30–34, 2015.View at: Publisher Site | Google Scholar
M. Soleymanpour and H. Marvi, “Text-independent speaker identification based on selection of the most similar feature vectors,” International Journal of Speech Technology, vol. 20, no. 1, pp. 1–10, 2016.View at: Publisher Site | Google Scholar
T. Oki, “Scene text localization using object detection based on filtered feature channels and crosswise region merging,” Growth & Change, vol. 21, no. 3, pp. 61–76, 2015.View at: Google Scholar
A. Tommasel and D. Godoy, “A social-aware online short-text feature selection technique for social media,” Information Fusion, vol. 40, no. 1, pp. 1–17, 2017.View at: Publisher Site | Google Scholar
G. Wu, M. Zhao, and L. Han, “A fingerprint feature extraction algorithm based on optimal decision for text copy detection,” International Journal of Security & Its Applications, vol. 10, no. 11, pp. 67–78, 2016.View at: Publisher Site | Google Scholar
Z. Robati, M. Zahedi, and N. F. Far, “Feature selection and reduction for Persian text classification,” International Journal of Computers and Applications, vol. 109, no. 17, pp. 1–5, 2015.View at: Publisher Site | Google Scholar
T. Zia, Q. Abbas, and M. P. Akhtar, “Evaluation of feature selection approaches for Urdu text categorization,” International Journal of Intelligent Systems Technologies & Applications, vol. 07, no. 6, pp. 33–40, 2015.View at: Publisher Site | Google Scholar
T. zia, M. P. Akhter, and Q. Abbas, “Comparative study of feature selection approaches for Urdu text categorization,” malaysian journal of computer science, vol. 28, no. 2, pp. 93–109, 2015.View at: Google Scholar
Li. De, Z. J. Xue, and C. Lih, “Text recognition algorithm based on text features,” International Journal of Multimedia & Ubiquitous Engineering, vol. 11, no. 5, pp. 209–220, 2016.View at: Publisher Site | Google Scholar
K. Yan, Z. Li, and C. Zhang, “A New multi-instance multi-label learning approach for image and text classification,” Multimedia Tools and Applications, vol. 75, no. 13, pp. 7875–7890, 2016.View at: Publisher Site | Google Scholar
G. Kumar and K. Vivekanandan, “Intelligent model view controller based semantic webservice call through mishmash text featuring technique,” Journal of Computational and Theoretical Nanoscience, vol. 14, no. 4, pp. 2021–2029, 2017.View at: Publisher Site | Google Scholar
B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” Ieee Transactions on Pattern Analysis & Machine Intelligence, vol. 39, no. 11, pp. 2298–2304, 2017.View at: Publisher Site | Google Scholar
M. M. Al-Tahraw, “Polynomial neural networks versus other Arabic text classifiers[J],” Journal of Software, vol. 11, no. 5, pp. 418–430, 2016.View at: Publisher Site | Google Scholar
Y. Hu, E. E. Milios, and J. Blustein, “Document clustering with dual supervision through feature reweighting,” Computational Intelligence, vol. 32, no. 3, pp. 480–513, 2016.View at: Publisher Site | Google Scholar