Abstract

In order to improve the effect of the mutual translation of Chinese and Korean literature, this article combines the semantic analysis technology to analyze the mutual translation of Chinese and Korean literature. Moreover, this article explores the analysis of Chinese and Korean literature mutual translation through intelligent semantic analysis methods and analyzes the theoretical basis of densely connected networks based on the semantic analysis and verification methods of densely connected networks and multilayer attention. Finally, this article implements semantic matching verification through the densely connected BiLSTM structure and adds a multilayer attention mechanism to the network to better interact with sentences to obtain the semantic relationship of sentences. The simulation analysis shows that the Chinese and Korean literature mutual translation system based on semantic analysis proposed in this article has a good translation effect and can effectively promote Sino-Korean literature exchanges.

1. Introduction

Since the establishment of diplomatic relations between China and South Korea, the exchanges and cooperation between the two countries in the fields of politics, economy, trade, diplomacy, and culture have been deepened. Both China and South Korea are deeply influenced by Confucianism and share many similarities in cultural heritage, and there are many similar rules and techniques to follow in Chinese and South Korean translation practice.

Traditional translation studies focus on the comparison of words and sentences between the original text and the translated text and pay too much attention to the analysis of morphological and syntactical aspects, failing to deal with the translation and transformation of texts larger than sentences, that is textual translation problems. However, the establishment and development of text linguistics have had a huge impact on translation studies. Translation refers to the process of converting the meaning of the language and characters of one country into the language and characters of another country, so that readers can accurately understand the connotation of the original work. Translation pays attention to “faithfulness, eloquence, and elegance,” among which, “faithfulness” is the most critical part of translation. Three principles should be followed when translating Chinese and Korean works. One must be faithful to the original text. Specifically, it is to accurately translate the things, phenomena, truths, and the author’s thoughts and feelings described in the original work. Second, the language should be fluent and clear and avoid rigid translation; otherwise, it will cause confusion in the meaning of the translation, and it will not be smooth. The third is to keep the original style as much as possible. Therefore, in the process of translation, translators must consider many factors, not only to understand the literal meaning of words, but also to consider the cultural environment and situation in which the text information is located.

Before the introduction of Western stylistic theories into China, the Chinese people’s understanding of “style” was relatively simple. Style mostly refers to “genres of essays” (quartiles: poetry, fiction, drama, and prose). At the same time, it also refers to a relatively stable and unique style of works of different systems and styles, which is a stipulation of the literary genre itself. After the mid-1980s, Western stylistic theories were introduced to China, and foreign language scholars usually used stylistic, style, and register as synonymous concepts. The three meanings of “genre,” “norms,” and “style” contained in the Korean word style are concentrated on the word “style” in Chinese. In this way, the meaning of the word “wen style” in Chinese has expanded a lot. At present, in modern literary theory, style generally has three levels of meaning: one is the genre of the work; the other is the language form of the work; the third is the writer’s style or genre characteristics.

In recent years, the term “stylistic” has become the same as critical concepts such as “structure” and “form” and has various meanings: sometimes it refers to genre, stylistic norms, and writing style. The literary style is a system with certain rules and flexibility established according to a certain collective aesthetic taste, and its generation and evolution are the aesthetic choices and social mentality that directly point to the times. From the surface, the style is the language order and style of the work; from a deep perspective, the style also carries the cultural spirit of the society and the personality connotation of the author and is connected with the social and cultural spirit. Generally speaking, stylistics is an extremely important dimension of translation studies, especially the growth of stylistics has promoted the development of literary translation studies based on stylistics. When discussing the relationship between literary stylistics and novel translation, it is regarded as complicated. But unfortunately, the real combination of stylistics and translation studies only gradually emerged in the early 1980s, and only in the field of comparative studies of Chinese and Korean languages, the relevant research results are still very rare.

This study analyzes the mutual translation of Chinese and Korean literature by combining semantic analysis technology, explores the analysis of Chinese and Korean literature mutual translation through intelligent semantic analysis methods, and improves the effect of Chinese and Korean literature mutual translation.

The similarity calculation method based on character matching determines the degree of similarity of two texts from the similarity of factors such as part-of-speech and form of the surface of the table text [1]. Reference [2] uses the minimum edit distance to measure the degree of similarity of texts. Reference [3] uses the Jaccard distance to represent the number of occurrences of k words in a text and the proportion of the corresponding words in the text. The degree of similarity between texts, where k is the size of the n-gram (n-gram) window used. The method of text similarity calculation cannot achieve the degree of simple string matching. Due to the diverse and complex semantic representations of Chinese, the similarity calculation based on surface strings cannot solve practical problems. It is necessary to achieve text similarity based on the semantic level. Matching involves the mining of semantics and the problem of how to represent [4]. Existing algorithms are mainly considered from two directions of statistical methods and semantic rules. The text similarity algorithm is used for statistical natural language processing; this similarity algorithm completely relies on the corpus and calculates the similarity in the text according to the word frequency of the keyword in the text [5]. Statistics-based text similarity algorithms can be divided into three models according to the different forms of constructing vectors: vector space model, topic model, and neural network model. The vector space model (VSM) represents the text as an independent feature vector group (p1, p2, ..., pn) and assigns specific weights according to the importance of its impact on semantics in the text. And the weight vector group is combined into a text vector space as the corresponding coordinate value, and the text similarity is calculated by calculating the vector angle between the two vectors [6]. VSM requires a large-scale high-quality complete corpus, but it is impossible to cover all corpora in reality, so there will be a high-dimensional matrix sparse problem [7].

The topic model believes that each text has its own topic, the topic is the link between the core keywords and the text, and the topic can represent the underlying semantic information of the text, so the similarity of the two texts does not only depend on the word frequency on the surface text. Word form and other information, mining the hidden semantic association of the text is the key [8]. The proposed LSA (latent semantic analysis) model organizes and summarizes the words in a large-scale corpus, generates a matrix composed of terms and documents, and uses singular value decomposition to filter out useless singular values for text reduction. Noise can solve the high-dimensional sparse problem and then convert the vector distance into the low-dimensional space to represent the text similarity [9]. Another common topic model, LDA (latent Dirichlet distribution) model, mainly models discrete data topic information and can identify topic information in corpora and large-scale text sets. LDA calculates the topic probability distribution by mining the representative words of the text, and the text obtains the text similarity by calculating the corresponding topic probability distribution, which makes LDA only suitable for the text similarity calculation of long texts. The number of representative words is small, and LDA cannot achieve good expected results in topic mining of short texts [10]. Reference [11] uses the implicit Dirichlet distribution to establish the topic space of the text to enhance the vectorized representation of the text. The implicit Dirichlet distribution only models the topic of the document and retains the topic information of the text that can represent the semantics of the text. It has a good effect on processing large-scale document sets. With the development of deep learning methods and the substantial improvement in computer computing performance in recent years, neural network models have also received extensive attention in text similarity calculation. The text similarity algorithm based on language rules mainly uses the artificially constructed semantic knowledge base to calculate the text similarity. Different semantic knowledge bases can use different organizational forms of concepts as the feature items of words to perform similarity calculation. Various organizational forms include the hyponymous relationship between concepts, synonymous and antonymous relationships, and each in the tree-like concept hierarchy. Elements (such as the path length between nodes, network density, the depth of a node in a tree graph, the amount of information a node contains, etc.) [12].

Unsupervised hashing algorithms learn hash functions from unlabeled data and aim to keep the learned hash codes as similar as possible to the original data [13]. Locality-sensitive hash (LSH) [14] maps the original data into a compact hash code by selecting a hash function that satisfies the location sensitivity, which greatly reduces the dimension of the data, and calculates the distance between the compact hash codes. To speed up the query, the sample is queried. The hash algorithm based on graph structure [15] learns a suitable hash code by discovering the inherent neighborhood structure. In order to speed up the calculation, an anchor graph is used to obtain an easy-to-handle low-rank adjacency matrix, and finally multiple Bit hash code. The iterative quantization algorithm (ITQ) [16] first extracts feature vectors from high-dimensional data and retains the feature vectors before eigenvalues, and then maps these dimensionality-reduced feature vectors to the vertices of the hypercube and minimizes the mapping error, by iterating the above operations to perform hash learning.

3. Semantic Analysis Technology Algorithm

In order to improve the accuracy of semantic matching, literature translation text data is formed according to literature translation, and the data are processed.

Literary translation terminology is a special language system. Before sorting out the data, it is necessary to analyze the characteristics of the literary translation in order to clarify the characteristics of the data, so as to make an accurate judgment on the processing method of the data. In the process of literary translation, both Chinese and Korean language needs to be implemented according to strict standards. Furthermore, both parties must ensure that the language is accurate and unambiguous and that synonyms cannot be used in place of standard vocabulary. According to literary translation standards, all key instructions need to be read back in the process of literary translation, and both incorrect and incomplete read back will have an impact on the translation results.

After converting the text data of literary content, it is necessary to organize and label the text data. According to the literary translation standard, the content of literary translation mainly includes three types of dialogues: command-recitation, command-response, and request-response. Errors in recitation-type dialogues can be classified into errors in rehearsal information and lack of recitation content, and errors in question-and-answer dialogues can be classified into irregular language and incomplete answers. Among them, missing recitation and incomplete answer belong to the problem of missing information, so they are uniformly marked as incomplete content. Finally, the literature translation data are divided into four parts, which are labeled as correct, wrong in recitation, irregular in terminology, and incomplete in content. The labeling specifications for each type of data are as follows [17]:

3.1. Correct

Data where the instruction is consistent with the response message and the wording conforms to the standard are marked as correct.

3.2. Recitation Error

The data that are inconsistent with the readback information is marked as readback error, such as the readback error of altitude, heading, call sign, and runway number.

3.3. Irregular Terminology

The use of irregular words results in ambiguous information, or the responses contain content irrelevant to the instruction, such data are marked as irregular words.

3.4. Incomplete Content

It refers to the incomplete response to the content required by the directive, and this type of data is marked as incomplete.

According to the labeling specification, double labeling method is used for data labeling. If the two labeling results are the same, it is regarded as valid, and the labeled data are stored in the database. If the two labeling results are different, it will be judged by professional air traffic controllers to determine the label type to ensure the accuracy of data labeling.

After the labeling is completed, each sample in the dataset consists of instruction sentences, recitation sentences, and labels. Labels are represented by numbers, and 0, 1, 2, and 3 indicate the meanings of labels are correct, wrong in recitation, incomplete in content, and irregular in terms, respectively.

Deep networks are an important analytical method in natural language processing. Moreover, deep networks have a better ability to capture semantic matching and mismatching relationships, but as the number of network layers increases, the problem of parameter excess and overfitting will also follow. Therefore, through the study of deep network, this chapter proposes to use multilayer densely connected network and multilayer attention to achieve semantic analysis of literary translation. As shown in Figure 1, the network consists of four parts. First, sentence vector representation is obtained from the input mapping layer. Then, the sequences are semantically extracted using a densely connected network, and an attention mechanism is added to the network to make the sentences interact. Finally, the obtained semantic vector is subjected to operations such as pooling, and a fully connected network is used to achieve semantic matching verification.

Introducing rich information during input can promote the subsequent semantic analysis, and use the combination of word vector and feature flag for sentence representation. Through the input layer, the vector representation of the instruction sentence and the vector representation of the reply sentence can be obtained. Among them, represents the vector representation of the jth word in the instruction, represents the vector representation of the jth word in the reply, and m and n represent the sentence lengths of the instruction and reply, respectively [18].

The stacking of the multilayer RNN network is to use the output sequence of the previous layer as the input of the next layer, which will cause the problem of gradient explosion and gradient disappearance, making the multilayer network difficult to train. However, the dense connection method can achieve better representation of semantic information through the reuse of features. Since BiLSTM is suitable for processing sequences, a multilayer BiLSTM structure is used in the dense network layer, and the sequence is semantically encoded by splicing. The network structure is shown in Figure 2.

Using a densely connected network not only hinders the transmission of information, but also retains the original information, so that the output value of the first layer can also be effectively transmitted to the last layer, avoiding problems such as gradient disappearance. The hidden states in the network are shown in formulas (1) and (2):where H is the BiLSTM structure and represents the number of layers of the network. After the sequence obtained by the input layer is encoded by the dense network, the semantic vector of the instruction and the semantic vector of the reply can be obtained.

In order to obtain the semantic matching features of sentences, an attention mechanism is added to the model, and the calculation method of the attention mechanism is shown in formulas (3)–(5) [19]:where F(·) represents the feedforward neural network and represents the relative weight of each word in the matrix.

Two connection methods of attention mechanism are used. The first one is to connect the attention of a single layer, that is the sentence vector is extracted through the dense connection network to extract semantic features, and then the feature vector output by the dense connection network is used to calculate the attention weight. Finally, the calculated results are spliced with the output vector of the densely connected network.

The second approach is to incorporate multiple layers of attention in densely connected networks. The method is to splicing the attention weight with the output value of BiLSTM after the weight distribution of each word in the opposite sentence is obtained through the attention calculation. Moreover, it takes the semantic vector containing the matching relationship as the input of the next layer. After adding the attention mechanism, the state of the hidden layer of the densely connected network is shown in formulas (6) and (7) [20]:where H is the BiLSTM structure, l is the number of layers of the network, and is the input of the first layer sequence at time t. At this time, the input of the lth layer sequence at time t is obtained by splicing three parts, and represents the input, output, and attention weight of the previous layer at the same time.

Since the densely connected network will lead to the problem of excessive parameters as the network deepens, this will lead to excessive pressure on the final fully connected layer. Therefore, adding an auto-encoder at the end of the network to compress the dimension can help compress the huge vector representation obtained by the densely connected network, while maintaining the original information. The network structure after adding multilayer attention and dimensional compression layers is shown in Figure 3.

The semantic feature analysis layer is to process and analyze the output of the previous layer, extract semantic matching features, and obtain vectors for semantic matching verification. The method adopted by the model is to perform maximum pooling and average pooling operations on the semantic vectors obtained by the previous layer. Then, it uses an association strategy on the vectors obtained after pooling to ensure that the integrity of the feature information can be preserved while transforming the features. The structure of the semantic feature analysis layer is shown in Figure 4.

First, the semantic vectors hc and hi of instructions and responses are processed by means of average pooling and maximum pooling. The calculation methods are shown in formulas (8)–(13):

The pooled semantic feature vector representations of commands and replies can be obtained as and , respectively. In order to ensure the integrity of feature information and transform features explicitly, an association strategy is used to represent feature semantics, that is the results of vector splicing, parametric addition, and parametric subtraction are spliced together. The calculation method is shown in the following formula:

The task of the semantic matching verification layer is to make judgments on the semantic matching results of the instructions and responses. In the model, a two-layer fully connected neural network is used to complete the semantic matching verification work. The purpose is to input the matching vector into the hidden layer to further mine the matching information between the instruction and the reply to obtain deep matching features. First, the semantic matching features are calculated through a two-layer fully connected network. The calculation method is shown in the following formula:where and represent the weight matrices of the hidden layer and the output layer in the fully connected neural network, respectively. and represent the bias of the hidden layer and output layer, respectively, and α represents the activation function, and the ReLU function is used here.

Then, the matching score is normalized by the softmax function to obtain a probability vector, which is the probability that the input sample identified by the model belongs to each data category. The calculation method is shown in the following formula:where represents the probability that the sample to be discriminated belongs to the category Ni (i = 1, 2, 3, 4).

In the process of model training, each training instance is a sample, that is, a set of dialogues consisting of instructions and responses. The cross-entropy function is selected as the optimization target in the model training. Through the back-propagation and gradient descent algorithms, the parameters of the network are adjusted through multiple iterations, so that the value of the cross-entropy function is minimized. The cross-entropy function is shown in the following formula:where represents the true label of the ith input sample and p(i) represents the predicted label.

In the process of model training, the optimization algorithm used is Adam, because it has the advantages of high efficiency and simple parameter adjustment, and has great advantages compared with other types of random optimization algorithms. The Adam algorithm can adapt to unstable objective functions by calculating the first-order moment estimation and the second-order moment estimation of the gradient, and can design independent adaptive learning rates for different parameters. The Adam algorithm combines the advantages of the root mean square propagation algorithm and the adaptive gradient algorithm and can also solve the coefficient gradient and noise problems.

In order to evaluate the results of model classification, the test accuracy is uniformly used as the evaluation index. The calculation method is shown in the following formula:where N represents the total number of samples in the test set, and represents the number of samples in the test sample whose semantic discrimination results are consistent with the true labels.

4. Analysis System of Mutual Translation of Chinese and Korean Literature Based on Semantic Analysis

According to the needs of the platform, this article designs a new technical architecture of the translation business management platform based on the SOA idea, as shown in Figure 5.

According to the needs of the platform, some functions are published as web services to facilitate the expansion and invocation of external systems. Some method functions are declared as web methods and packaged in specific class web services and published in the form of web services, as shown in Figure 6.

On the basis of constructing a balanced dataset, the method of ensemble learning is introduced. By comprehensively considering the classification results of multiple base classifiers, the classification accuracy of negative class samples is not reduced as much as possible while ensuring the classification accuracy of positive class samples. The term recognition model is shown in Figure 7.

The overall architecture design diagram is shown in Figure 8.

The first part: 1. Thesaurus loading: When starting the dictionary, the system will query whether the local thesaurus exists, and if the database is started for the first time, the thesaurus file will be loaded. 2. Enter the system interface: If it is not the first time to start and the database has loaded the thesaurus file, skip the thesaurus loading process and directly start the program interface. The second part: After entering the program function interface, the system is regarded to a major function of word query: The system local word query function is as follows: (1) Input words or phrases. (2) The word query system automatically searches and compares the input content with the local thesaurus and displays the relevant matching words in the drop-down list. (3) It displays the query result. After clicking the word to be translated in the drop-down list, the program enters the result display interface and displays the parts of speech and definition of the word stored locally.

The above constructs a Chinese-Korean literature translation system based on semantic analysis. Next, this study evaluates the effect of the system, analyzes the effect of mutual translation of Chinese and Korean literary works, simulates it through the MATLAB simulation platform, and obtains the results as shown in Table 1 and Figure 9.

From the above analysis, it can be seen that the Chinese-Korean literature mutual translation system based on semantic analysis proposed in this article has a good translation effect and can effectively promote Sino-Korean literature exchanges.

5. Conclusion

Contemporary linguistics and translation researchers have broken through the traditional translation research method of using words to analyze words and sentences to sentences, and then expand the translation unit to the discourse level. Translation research is no longer limited to the study of sentences in the original language and the target language, but expands the field of vision to the communicative function of the context and language. For example, discourse analysis treats text as a communicative activity rather than a set of stereotyped text structures, while pragmatics studies the use of language rather than language as an abstract system. It can be seen that the communication and context of language make discourse analysis and translation closely related. Knowledge and research on discourse analysis can not only help us understand the original text correctly, but also provide theoretical basis for choosing the appropriate translation. This study combines semantic analysis technology to analyze the mutual translation of Chinese and Korean literature and explores the analysis of Chinese and Korean literature mutual translation through intelligent semantic analysis methods. The simulation analysis results show that the Chinese-Korean literature mutual translation system based on semantic analysis proposed in this article has a good translation effect and can effectively promote Chinese-Korean literature exchanges.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This study was sponsored by Konkuk University.