Abstract

Data point overlap exists in the model translation recognition results of generalized maximum likelihood ratio detection (GLR) algorithm. A fuzzy semantic optimal control intelligent recognition model for English translation based on improved GLR algorithm is proposed. This algorithm is used to create a phrase corpus for marking tens of thousands of English and Chinese words, so that phrases can be searched automatically. The algorithm builds a phrase corpus of about 710,000 Chinese and English words. Phrase structure is constructed through phrase centers. Partial speech recognition results can be obtained. According to the syntactic function of analytic linear list, the ambiguity of Chinese and English structures in part of speech recognition results is corrected. Finally get the content of the identifier on the basis of comprehensive evaluation. The recognition accuracy based on the improved algorithm is more than 95%. The overall score was 92.3. This algorithm overcomes the disadvantages of GLR. Compared with statistical algorithm and dynamic memory algorithm, the algorithm improves the operation speed and processing performance and is more suitable for machine translation tasks. It provides a new idea in the field of machine translation.

1. Introduction

In the context of globalization, English is the international language and is being used more and more widely. But there is a shortage of people with high English skills in all industries. In view of China’s market demand, attention should be paid to students’ English language ability [1]. In English language ability, English ability is the key, in order to use English in daily communication practice, English pronunciation correction and learning should be strengthened. In the past, it required a lot of human resources and was difficult to implement. Therefore, it is particularly important to develop an intelligent recognition model for English translation. The model can monitor students’ English pronunciation and provide corrective suggestions. The traditional intelligent recognition model and setting of English translation have certain problems, which cannot accurately identify English translation, correct students’ English pronunciation, mislead students’ pronunciation, and affect English learning [2]. Based on the above problems, this paper studies the intelligent recognition model of English translation based on fuzzy semantic optimal control of GLR algorithm, so as to automatically recognize English translation.

By analyzing the characteristics of English texts in the machine translation unit, the PTA unit system model realizes the process of English-Chinese translation and the translation of small texts oriented to the English-Chinese Word corpus. Some PTA models are explained in detail, revealing the importance of corpus [3, 4]. In the process of measuring the translation similarity of the model, the cosine similarity calculation method is used to obtain the semantic similarity of the two vectors, and a dependency tree based on HowNet vocabulary semantic similarity and log linear model is designed. Providing language dependent structured processing can ensure the correspondence between Chinese and English. Calculate the semantic similarity between the sentences to be translated by the HowNet operation input and the language vocabulary generated in the case base, so as to further improve the accuracy of translation and make the translation results more accurate.

Structure-based translation model is the earliest phrase-based model, which first gives rough alignment to the phrases of the source language and the target language, and then gives a detailed alignment to the words within the phrase [6]. It also uses EM algorithm for parameter estimation. Structure-based translation models are deeply influenced by IBM models, and there are similarities in modeling methods and parameter estimation. In particular, sub-models for phrase segmentation and alignment are also established [7]. Therefore, this model is very complex, and the complexity of parameter estimation and search is very high. Alignment templates [8] replace words in phrases with parts of speech to achieve generalization. This is a special phrase translation method, while other researchers almost adopt the direct translation method for phrases, that is, the target language phrases in bilingual phrases are directly regarded as the translation of the source language phrases. The model based on the alignment template simplifies phrase division and phrase reordering greatly. First, a unique phrase division is adopted. In addition, phrase reordering only depends on the position of the previous phrase [9]. The model based on the alignment template no longer takes word alignment as a hidden variable, and the model itself cannot do word alignment for bilingual corpus, so the training data used must be word-aligned bilingual corpus. The parameter estimation method is improved, abandoning the traditional EM-based estimation method [10], and adopting a simpler estimation method based on relative frequency, thus greatly reducing the complexity of parameter estimation. This is a major change, and many subsequent translation models adopt a similar approach, which does not have the ability to generate word alignment, but directly uses the word-aligned bilingual corpus for parameter estimation. This is the main reason why it is now so widely used. The advantage of the column search algorithm is that it can use various pruning strategies to balance efficiency and accuracy [11]. Many statistical machine translation systems have since adopted this algorithm. A log-linear model is introduced into statistical machine translation. Log-linear model is modeled directly, and various knowledge sources are regarded as characteristic functions [12]. Its main advantage is that it can easily integrate various knowledge sources and automatically adjust the weight among them. Instead of the aligned template regression phrase, lexical weight was proposed to enrich the parameter estimation method of phrase-based translation model [13]. Pharaoh, developed by Philipp Koehn, is the most influential and freely available phrase-based statistical machine translation system available in the public domain. The idea of multi-engine machine translation was proposed, and Pangloss Mark III machine translation system was designed by using the idea of multi-engine machine translation [14]. The system combines rule-based machine translation methods, machine translation method based on the instance, and based on the vocabulary translation machine translation method, the main design idea is as follows: receives input sentences, using the multiple translation engines in parallel translation sentence fragments (phrases and words), will grade each translation unit of storage in a chart, and according to some criteria to grade each translation unit. Finally, the dynamic programming algorithm is used to obtain the optimal translation result. Appropriate scoring criteria are related to the selection of the best translation results. Pangloss system uses a combination of manual evaluation and heuristic evaluation to score the translation results. Subsequently, Brown adds statistical models to the multi-engine machine translation system [15]; n-Gram is used to select candidate results. The artificial participation in the evaluation of translation results in Pangloss system is reduced, and a predictive statistical model is used to guide the evaluation and selection of multi-engine translation results [16, 17]. The recurrent neural network imitates human translation process and has good modeling ability of time series to a certain extent. However, neither the original RNN nor the later LSTM, GRL, etc., are free from the constraints of the overall network timing, so parallel training cannot be carried out, resulting in a very low training efficiency. Moreover, it is impossible to learn effective global information in long sentences, and the difficulty of long-distance dependence is not fully solved.

Through the summary of the above literature, the technology of intelligent phrase recognition can effectively reduce grammatical ambiguity. Structural ambiguity is a difficult point in the field of the current English translation, which needs to be solved by speech recognition algorithm. In this paper, generalized maximum likelihood ratio detection based on improved GLR is used. The algorithm constructs an English word phrase corpus of about 740000 tags, which makes phrases searchable. The first part is the introduction, and the second part is the Improved GLR Algorithm Fuzzy Semantic Optimal Control English Translation Intelligent Recognition Model. The third part is Experimental Design, the fourth part is Experimental verification, and the fifth part is conclusion.

2. Improved GLR Algorithm Fuzzy Semantic Optimal Control English Translation Intelligent Recognition Model

2.1. Intelligent Recognition Model for English Translation

The use of corpus plays an important role in the intelligent translation model. Bilingual phrases in the corpus can be used to store data, accurately label the part of speech of English-Chinese short phrases, standardize the function of each phrase, improve the timeliness and accuracy of phrase automatic recognition algorithm in the process of English-Chinese machine translation, and make English-Chinese machine translation more accurate. Figure 1 shows the information flow of phrase corpus.

There are more than 700,000 words in the phrase corpus of the English translation intelligent recognition model designed in this paper, which can meet the practical needs. As shown in Figure 1, phrase corpus is highly targeted. In this paper, the tenses of different phrase corpora are comprehensively distinguished and marked with English and Chinese phrase corpora. The marking method of corpus includes three parts: hierarchy, data, and processing. The data type is text format. The hierarchy uses the way of alignment and part of speech, and the processing method uses the way of human-machine active communication and direct interaction.

Part of speech recognition is particularly important in machine intelligence recognition algorithms, which can deal with grammatical ambiguity of a large number of phrases, sentences, and words. Words in short sentences can be divided by using the contents of labeled phrase corpus. Words in English sentences are independent, realizing the Chinese word segmentation process to judge translated sentences and parts of speech. Finally, the dependency relationship between phrases is analyzed to generate sentence syntax tree.

Generally speaking, the accuracy of the existing part of speech recognition cannot be satisfied because the GLR algorithm in the part of speech recognition results has a high probability of coincidence. Phrasal contextual W likelihood D of improved GLR Algorithm using quaternion cluster Computing S:

In order to obtain the best translation, a set of criteria is needed, and the criteria are as follows:

2.2. Improved Fuzzy Semantic Optimal Control of GLR Algorithm

The construction of translation model mainly includes four parts: word alignment, word scoring, phrase extraction, and phrase scoring. The training of the basic phrase translation model takes the bilingual aligned corpus as input and obtains a phrase translation table. Phrase scoring mainly consists of two parts: bidirectional phrase probability calculation and dictionary probability calculation, in which the dictionary probability refers to the probability of the dictionary within the phrase in the corpus. The phrasal translation table finally has four parts, which are: phrasal translation model probability, reverse phrasal translation model probability, phrasal dictionary model probability, and reverse phrasal dictionary model probability. The extracted phrases can be evaluated by the calculation of dictionary probability. The training process of basic phrase translation model is shown in Figure 2:

A variety of language information (such as surface morphology and part of speech) is incorporated into the translation model of Factors phrases to enrich the translation model. Factored model allows n-gram standard Factors for each Factor, called a “sequential model”. Analogical language model meta-sequence model:

Based on the fuzzy semantic correlation degree classification method, a method is proposed to continuously check and correct the classification results and then gradually establish the classifier. It is hoped that the classifier can correctly classify most of the training texts by setting and modifying the feature fuzzy sets describing the categories in the training stage, and finally achieve the purpose of better classification of the test texts. Begin by briefly describing the training process. Firstly, the method of calculating fuzzy semantic correlation degree described in the previous section is used to classify training texts and obtain classification results. If the training text is classified into the wrong category, that is, it belongs to the first category of the two wrong classification results, the membership degree of the same words and semantically similar words in the feature fuzzy set of the description text and the feature fuzzy set of the description category will be reduced. On the contrary, if the text is not divided into any category, that is, belongs to the second category of the two classification results we talk about, we improve the membership degree of the same words and semantically similar words in the category feature fuzzy set and the text feature fuzzy set. This method is used to find the optimal membership degree of each feature word in the fuzzy set of each category feature and adjust the representation of the fuzzy set of category. The training algorithm is repeated until the performance of the classifier is no longer significantly improved. Figure 3 shows the flow chart of the training algorithm.

2.3. Part of Speech Recognition in English Translation Model Phrase Corpus

Part of speech recognition of phrases is a key step in intelligent recognition algorithm of machine translation, which can deal with the grammatical ambiguity of a large number of sentences, phrases, and words. Part of speech tagging in the content of phrase corpus is, each sentence will be divided into several words. For English sentences, each word is an independent Chinese sentence, which needs word processing. The processed words are aligned to form phrases, and the words of part of speech will be marked through the translation of sentence context. Finally, the syntactic tree of sentences is formed by analyzing the dependency of phrases. This method improves the timeliness and accuracy of machine translation and significantly increases the processing capacity of phrase corpus. GLR algorithm is a commonly used algorithm in part of speech recognition, which is mainly used to judge the contextual relationship of phrases. Its core theory is based on dynamic recognition of forms and unconditional transfer statements.

The classic GLR algorithm operates at each step using a variety of shift instructions and condensed operations, and the beginning and end of each operation in the process are shown using envoy standards. In the process of phrase translation, when GLR algorithm does not detect grammatical ambiguity, it will restart the recalibration and calibration operation. If syntactic ambiguity is detected, it is necessary to use the geometric structure linear table of syntactic analysis to extract the analytic linear table, expand the recognition of the content of the phrase, provide the optimal content according to the principle of local optimum, transport it to different recognition channels for symbol recognition, and select the optimal result according to the recognition results.

During the operation of the improved GLR algorithm, the type of pointer should be identified before the termination is replaced. If it is a protocol pointer, the constraint conditions of pointer should be detected in the phrase corpus. If it does not, it goes directly to the termination pointer. End pointer usually appears in a structural ambiguity backup point position, after the English translation to the termination of the pointer, will form a phrase structure tree, then mark symbol stack, study the center of the backup point symbol is there, is placed on the correct sentence structure. The recognition results of corrected parts of speech are shown in Table 1.

Dependency structure can replace phrase structure with rich lexical information because it can analyze predicate information which is very important in application more effectively. To obtain the dependency information, we should first analyze the dependency of sentences. The selection of characteristic functions is the key step of dependency analysis, and the right set of characteristic functions can train a good analyzer. The basic characteristics are shown in Table 2.

The first category includes the word information between the head node of a word and the word it modifies. These features are a ternary model of part of speech markers: the head node part of speech marker, the modifier part of speech marker, and the intermediate part of speech marker. These features greatly improve the accuracy of finding the first node word of the noun, because it reduces the score when the parent node is two nouns and the middle word is a verb, and this linguistic phenomenon is very rare.

The second feature is the contextual word information of parent-child node pairs. This feature is a quaternion model: the pos tagging of parent node, the POS tagging of child node, the POS tagging of the word before or after parent node, and the POS tagging of the word before or after child node. There are also some fallback models, that is, the ternary model formed by canceling one of the features. The addition of this information is independent of the dependency of the current edge.

The addition of these two kinds of features greatly improves the analysis tree effect, according to the introduction of probability valence theory. Since it is a force, there will be a magnitude. The magnitude of a word class can be described qualitatively by the difference in the number and quantity of the dependency relations that a word class can (be) governed by, or a more accurate quantitative description can be obtained through a corpus. Therefore, we add two other important types of characteristic information.

The first type adds sibling node information in the dependency tree, which is a ternary model: sibling node part of speech mark or dependency mark, word part of speech mark or dependency mark, and number of sibling nodes. The number of sibling nodes adopts the rollback model.

Verbs play an important role in sentences. Similarly, verbs account for more than 90% of Chinese dependency root nodes in the dependency tree bank. So, the processing of verbs will be beneficial to the analysis effect. Subject-verb-object is the backbone of a sentence, so it is of great significance to extract the subject-verb-object of a sentence for dependency analysis. According to the combination characteristics of Chinese sentences, when a verb is the predicate, its rightmost and leftmost sub-words are the subject and object dominated by it, respectively. So, the second category of features added is represented as a ternary model: the leftmost subpart, the verb, and the rightmost subpart.

Based on the above characteristic information, the dependency parsing model is constructed by constructing characteristic functions, and the dependency parsing parser is designed for parsing and obtaining dependency information.

2.4. Experimental Design

Assessment: the process of the three English-Chinese machine translation for the specified plus the phrases and 50 random network statement for translation, translation of professional article also reviewing the specified with the phrases and 50 random network statement for translation, grading staff by comparing and human translation, machine translation and algorithm of three English-Chinese machine grading, scoring rules as shown in Table 3.

3. Results and Analysis

In this evaluation experiment, 50 phrases and 50 network random sentences were identified. According to the test results in Figure 4, the machine translation based on the improved GLR algorithm is the best recognition accuracy, recognition speed, and updating ability. As can be seen from the comprehensive evaluation results in Figure 5, the highest score is 92.5 points based on the improved GLR algorithm, and the lowest score is 76.2 points based on the statistical algorithm. There is little difference between the dynamic memory algorithm and the improved GLR algorithm in the final test score, and the main difference between the two is in the update ability score. Combined with Figures 4 and 5, it is obvious that the improved GLR algorithm has obvious performance advantages over other algorithms.

In the “explanation,” during translation, machine translation based on the improved GLR algorithm is the closest to human translation. It can be clearly seen that the statistical machine translation algorithm based on the improved GLR algorithm designed in this paper is more accurate than the dynamic translation memory algorithm. The recognition accuracy can reach more than 95%, which is equivalent to the level of human translation.

In practice, the models in the general model library may not exactly conform to the target translation mode. Now, with other conditions unchanged, model 2 and model 3 in the model library are simulated again, as shown in Figure 6:

The experimental results of the first part show that the fuzzy semantic relational degree classification algorithm has great advantages in both accuracy and recall rate, and then this paper proposes a membership degree updating algorithm based on this algorithm. This part of the experiment is to analyze and test the classification performance of the fuzzy semantic relational degree classification algorithm using the membership degree update algorithm. For the analysis of the experimental results in the first part, we conclude that the text features and category representation are still the main reasons that affect the performance of the classification system. The membership updating algorithm in this paper can obtain the optimal key fuzzy set describing the classification system from the existing training text set. Therefore, it can be said that the characteristic factors that affect the performance of the classification system are minimized.

After a training of the training set text, we have known in the category system of all kinds of other fuzzy feature vector, and then closed in training on test, according to the test results verify modified fuzzy membership degree value of the key words in the vector, repeat the process over and over again and constantly tested closed, until the classifier of stability. At this time, the mF value of the classifier reaches the optimal value of the closed test, and the fuzzy set representation which approximately describes the semantics of each category also reaches the state of stable optimization. Figure 7 is the curve of micro-average accuracy rate, recall rate, and F1 value changing with different values of A in closed test after classifier performance is stable.

In terms of English translation time, the expert instances in the ExerptFDO showed a nearly linear increase in the three types of English translation. It is clear that the semantic English translation required the most time relative to the expert instance in ExerptFDO, but the time spent was in the order of milliseconds, which was acceptable to the user, as shown in Figure 8.

When users prefer the preference setting of semantic English translation (0.5,0.5), it means that users have the same preference degree for the two English translation conditions, that is, fuzzy semantic English translation without user preference. When preferences are set to other values, different numbers of qualified experts are returned to users. Under semantic English translation conditions such as fuzzy semantic English translation and preference semantic English translation with different preference Settings, the influence of setting different matching thresholds on English translation results is shown in Figure 9.

Figures 10 and 11 both show the distribution of the node control points identified by the system. The compact distribution of node control points in Figure 10 indicates that the proofreading result is more accurate, and the problem of contextual incoherence. In Figure 11, the syntactic and phrase-based recognition system has a loose distribution of node control points, but it has a compact distribution of node control points in the 1st, 4th, and 5th experiments, indicating that the system has a high calibration accuracy, but the coherence of early warning of translation results is poor. Secondly, the distribution of node control points in this system is loose and compact alternating, indicating that it is not stable.

4. Conclusion

Aiming at the difficulty of structural ambiguity in English translation, an improved GLR algorithm was proposed to overcome the disadvantage of data point overlap in the traditional GLR algorithm. The improved GLR algorithm uses the phrase center point to design the phrase structure, and corrects the structural ambiguity, thus effectively alleviates the low accuracy of the recognition results in the traditional statistical algorithm and dynamic memory algorithm, and assigns the most reasonable position for the recognized phrase. Experimental results show that compared with other algorithms, the machine translation algorithm based on the improved GLR algorithm has the characteristics of simple and quick calculation, low difficulty and better practicability, and is suitable for English machine translation.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by School of Foreign Languages, Henan University of Animal Husbandry and Economy.