Abstract

The rapid development of today’s society is accompanied by the explosive growth of information data; in the process of information transmission, language is a very important carrier. Among all kinds of communication languages, English always occupies an important position and is one of the most commonly used languages in social life. Therefore, the practical significance of English education is self-evident. With the popularization of the Internet, intelligent phrase recognition in machine translation is the key technology. With the help of natural language processing technology, an English translation corpus can be built to accurately mark the parts of speech of short words, and phrase recognition technology is used to correct grammatical ambiguity effectively. Structural ambiguity is a difficult problem in the field of English translation. Based on the random matrix model of the improved GLR algorithm, phrase structure labelling is constructed through the phrase corpus. Revised annotation can effectively improve the accuracy of academic translation, and intelligent English translation is realized through recognition technology. Simulation experiments verify the effectiveness of the model, and the results show that the English translation intelligent recognition model has a high proofreading accuracy. When the value of P is 0.95, the high accuracy can be retained to the maximum and the efficiency and feasibility of improving the GLR algorithm in machine translation can be improved.

1. Introduction

With the background of globalization and internationalization, the universal language English is more and more widely used. English is famous for its huge vocabulary, and the introduction of technical terms is short of talents with high English ability. With the continuous development of information technology, English teaching needs to attach importance to students’ communicative competence [1]. The most essential function of English is to be used in daily communication, and English ability should be strengthened to correct pronunciation and accumulate language knowledge. The traditional teaching method is mainly taught by teachers, which requires a lot of human resources. Therefore, the design of an intelligent English translation model has become a research hotspot. Traditional English translation models cannot be accurately identified, which has a negative impact on English translation [2]. Language is a bridge of communication, and better translation between Chinese and English is also an important link to improve efficiency and strengthen communication. In the global mutual language translation, English-Chinese translation is the focus and difficulty of machine translation. As we all know, Chinese is very complex in terms of semantics and language structure. Different combinations of Chinese characters have different meanings and are more dependent on the context in different scenes. The comprehensive informatization of social life is the main development trend of today’s society. As an important information carrier, English occupies a very important position in communication languages and is one of the most commonly used languages in social life. According to relevant data, the number of ESL learners around the world has far exceeded 1.5 billion [3]. The correction of grammatical errors and good learning results are inseparable. If students want to quickly improve their ability to master English grammar, they will often use knowledge such as Chinese semantic structure to fill in the gaps when they do not have a thorough grasp of English grammatical semantics and sentence structure [4]. However, language learning is influenced by the habits of the mother tongue, and language learners will inevitably make some typical grammatical mistakes. Language learners need to reflect on these grammatical errors. It is undoubtedly of great practical significance for English translation to use computer programs to automatically correct grammatical errors in texts. Based on the current development of natural language processing technology, it is increasingly possible to design a grammar error correction model with high accuracy and many types of error correction. Natural language processing (NLP) uses computer programming to analyze large amounts of natural language data [5]. Linguistics is closely related to translation, so the study of English translation models has become an inevitable trend of academic development.

English translation applications mainly focus on the translation of academic literature, search engines, and other foreign languages. Therefore, intelligent translation technology has strong application demand and development prospect. There are more or less defects in traditional research. It hinders the development of intelligent translation technology. In actual machine translation products, the quality of human translation varies greatly [6], which exposes that some machine translation levels can no longer adapt to the current translation needs. The results of computer-aided translation are usually taken as a reference, and the user is left to judge the quality of the translation [7]. The rational use of professional vocabulary corpus not only reduces the workload of translation, but also improves the accuracy of translation. Song analyzed the English discourse in machine translation units that belong to the period of features, and its unit is NT for small sentences; the translation of a PTA unit system model to realize the process of English-Chinese translation, and realize the small discourse translation between English and Chinese words and the corpus-oriented construction, which has carried on the detailed explanation of some of the PTA model, reveals the importance of corpus [8]. Based on the vector mixing phrase synthesis of the semantic statistical English translation, Shen and Qin used the cosine similarity calculation method to obtain the semantic similarity of two vectors in the measurement process of the translation similarity model, and used a weighted vector method to identify the differences between the two vectors, so as to ensure the translation quality of accurate translation results [9]. Shu used the pipeline-based layer-by-layer analysis technique to analyze machine translation, made a comparative analysis of the parts of speech and syntax from the phrase corpus, and obtained the syntactic structure of the English to be translated. A semantic similarity model is designed to preserve the corresponding bilingual corpus in a tree-to-string Chinese-English dependency structure and ensure the Chinese-English bilingual correspondence [10]. It is necessary to improve the accuracy of translation to calculate the semantic similarity of words in the same language.

With the gradual construction of scientific modernization and global integration, the mutual translation of global languages has become a necessary condition for global communication. However, translation is a shortcut for international language communication. Up to now, word-for-word translation by human interpreters cannot adapt to the high-speed information society. How to use various languages to transmit information more efficiently, accurately, and conveniently has become the focus of language translation research. The model was compared and analyzed with the experimental results of various baseline systems. Based on the seQ2SEQ structure, Duan et al. constructed a nested attention layer with two levels of word and character attention and corrected global grammar and sentence fluency with word-level attention. Spelling errors are corrected through character-level attention, and GRU units are used in both encoder and decoder [11]. Xia used multilayer CNN and attention mechanism to correct grammar and collocation errors, initialized word embedding through a pretraining method, and introduced a multimodel integration strategy [12]. Borsotti et al. adopted the hybrid model of SMT and NMT and added an additional neural language model based on the finite state converter (FST) method [13]. Sheriff et al. designed a method to generate parallel sentence pairs unsupervised by editing distance, and used marked error data to fine-tune the model, using the transformer structure and integrating the language model and the reordering method [14]. After a summary of the above documents, a sentence phrase containing semantics is usually the core content in this sentence, the phrase of intelligent recognition is an important link in the whole process of speech recognition, and the principle is based on the summary that can identify the phrase and then can analyze the part of speech of the phrases and syntax, and the automatic control phrase corpus translation combination. In the existing model of grammar, the large-scale and high-quality corpus is still a major factor limiting the performance of the model, and how to reduce the number of model parameters to save computational resources is still a difficult problem to be studied.

Based on previous studies, the English translation corpus accurately labels the parts of speech of short words and establishes a random matrix model based on the improved GLR algorithm. In order to effectively improve the accuracy of academic translation, a series of simulation experiments are carried out.

3. Improved GLR Algorithm for English Translation Model

3.1. Create an English Translation Corpus

This paper applies a corpus design to the intelligent English translation model. The corpus is used to store the information of bilingual phrases and accurately mark the parts of speech of short words. In this design, each phrase function is standardized, which improves the timeliness and accuracy of the phrase automatic recognition algorithm [15]. The information flow of the English translation corpus is shown in Figure 1. Corpus stores the information of bilingual phrases, which can accurately mark the parts of speech of short words in Chinese and English [14]. The design model converts long sentences into multiple pairs of short words and then uses a scoring algorithm to evaluate the matching corpus. Finally, after confirming the translation context and the merits of the corresponding translation phrase, the scoring range can effectively form intelligent translation results. Therefore, the overall efficacy of the phrase corpus is of great importance to the machine translation algorithm.

This article is based on an intelligent identification phrase corpus of English translation of the model structure that contains 750000 words; 23000 of the 13000 sentences and phrases can satisfy the structure of demand. The phrase corpus in the figure is targeted, and the phrase corpus in English-Chinese translation is used in this paper; we annotate the English and Chinese phrasal corpus to distinguish the tenses of different phrasal corpuses. The marking method of corpus is composed of data, hierarchy, and processing method. Text format is adopted for data types, parts of speech and alignment are selected for data levels, and man-machine interaction is adopted for processing. Phrases are annotated with scale, application field, expectation system, grammatical tense, and other contents through English-Chinese translation corpus information. The agent processes the phrases through a series of strategies to obtain the maximum expected translated text.

3.2. Identify Parts of Speech in English Translation Corpus

In machine intelligence recognition algorithms, speech recognition is the most important part, which can deal with the grammatical ambiguity of a large number of phrases, sentences, and words. The labelled phrase corpus is used to classify the words in short sentences. The words in English sentences are independent, and word segmentation is used to judge the translation sentences and parts of speech and build a sentence syntax tree. This method is used to improve the accuracy and effectiveness of the machine translation process and improve the ability of processing the phrase corpus [16]. The GLR algorithm determines the context of phrases and dynamically identifies formal unconditional transfer statements. If syntactic ambiguities are detected, the syntactic analysis structure linearly calls the parsing linear table to identify the content of the phrase. The principle of local optimization improves the quality of the content and the accuracy of recognition results.

Part-of-speech recognition of phrases is a key core step in the intelligent recognition algorithm of machine translation, which can process grammatical ambiguity of a large number of sentences, phrases, and words [17]. Each sentence can be divided into several words by POS tagging the contents of the phrase corpus. Each word in an English sentence exists independently, and Chinese sentences need word segmentation. After the words are aligned, phrases are formed. During the process, word parts of speech are marked by judging the context of the translated sentence. Finally, the sentence syntax tree is formed by analyzing the dependency of phrases. This method improves the timeliness and accuracy of machine translation and significantly increases the processing power of the phrase corpus.

3.3. Improved GLR Algorithm

The function of the GLR algorithm in the design is to judge the contextual relationship of phrases, and the basic principle is the dynamic identification of forms and unconditional transfer statements [18]. The classical GLR algorithm requires the use of multiple shift instructions and simplified operations at each step. During the calculation, the beginning and end of each operation are shown using the envoy’s criteria. When the GLR algorithm does not detect grammatical ambiguity, the phrase translation process will be recalibrated. If syntactic ambiguity is detected, the structural linear table of syntactic analysis is used to retrieve the analytic linear table and expand the recognition of the content of the phrase; it gives the principle of local selection optimization. The symbols of different channels are recognized, and the optimal result is selected according to the recognition result. Generally speaking, the result of part-of-speech recognition based on the GLR algorithm is relatively accidental, and the recognition word has a high probability of overlap. In this paper, the random matrix is used to improve the GLR algorithm, and the phrase center is used to analyze the existing phrase structure. The following formula uses the GLR algorithm based on a quaternion cluster to calculate the contextual likelihood of phrases [19]:where S is the start character cluster, P is the production formula set, Vn is the cyclic character cluster, Vt is the end character cluster, and α is the phrase action cluster.

If P represents any recognition action in α and exists in Vn, the following formula can be derived:

In the formula, θ, d, x, and δ represent the symbol on the right side of the action, the constraint value, the center point symbol, and the marking method, respectively. Only when the recognition result exceeds 3 criteria can a phrase be considered as a recognition result [8].

Step state of the GLR algorithm analysis is stacked, the analysis pointer points to the input symbol to be analyzed, and the termination flag is cleared. If there is no end flag, the current input symbol is mapped to the analysis table terminator using a mapping function. Check to determine which operation will be performed next: If it is a move-up operation, the current state and current symbol are pushed onto the stack, and it is analyzed whether the pointer is moved down in the stack; and if it is a stipulation, the constraint function is called to check whether the stipulation condition is sufficient. Then, the syntax tree composed of nodes that pop up from the symbol stack is constructed and pressed into the symbol stack. Then, the intermediate states in the state stack are popped up and the new states are pressed into the state stack. If it is an error, it means that the pointer to the analysis table terminator “error” belongs to the analysis failure, and the initial state is restored. The process is continued to perform the next action in turn until the analysis is complete.

This section quantitatively analyzes the importance of indicators according to the metric matrix obtained. For a function f (x), when n indices are fixed and the KTH index is changed separately, the influence of the change on the whole function can be expressed. Therefore, for the distance formula to fix n characteristic indexes, let the KTH characteristic index change separately. To investigate the contribution of a change in a component of the vector (x − y) to the change in distance, we need to find the partial derivative.

But obviously, the partial derivative is not constant; it depends on xk and yk. The ideal importance indicator selection should be a universal constant, independent of the size of the indicator. Therefore, a specific value is chosen for the vector (x − y) such that the importance index can be a constant.

Let the moment matrix of broad-sense measure be A, and two samples be x = (x1, x2,…, xn) and y = (y1, y2,…, yn), and the gap between D(x, y) can be calculated by

The importance of the KTH index is given by the following formula:where ak is the vector formed by the KTH row of the metric matrix A. The importance of calling WK index is analyzed according to the generalized metric matrix. We can construct the index weight according to the importance of the index; that is, the weight of the l index can be calculated as follows:

If A is the identity matrix, each index has the same weight according to formula (6); if A is the diagonal matrix,

According to formula (7), the weight of the Wk index is determined by k, which is consistent with the significance reflected by the special form of the metric matrix.

3.4. A Random Matrix Translation Model Based on the Improved GLR Algorithm Is Constructed

The model design makes an overall plan for the functions needed for English translation recognition. The model design process is shown in Figure 2. This model can collect words, output, and process parts of speech. The data acquisition device is designed to collect speech signals and input vocabulary signals into the processing system using audio input devices. After processing the data signals, the data results will be output in the corresponding client and display. Users can view them through the display or client, and the automatic identification results of English translation will be all displayed [20]. The improved GLR algorithm adopts an analysis table, a graph structure stack, and a shared compression forest to improve the analysis speed. It effectively resolves the conflicts in analysis tables. The syntactic analysis algorithm in machine translation consists of analyzing state table, grammar rules, and graph stack structure. Among them, the analysis recognition table is mainly used to guide the action of analysis, and the image structure stack stores the history log of the analysis process. The preprocessing stage of the sentence analyzer based on the GLR algorithm mainly includes word segmentation, part-of-speech tagging, and the establishment of the analysis transfer table.

One of the core techniques used by algorithmic parsers is the graph stack, which is a directed acyclic graph. There are two types of nodes in this diagram: one is a part-of-speech node, and the other is a state node, which corresponds to the state in it. Given a rule and an input string, the state is constructed by these rules. This state node is a disjoint set of state nodes whose number is one state at a time in the stack of graph structures. When the first state node is initialized to the state node for analysis, all “reduced” action operations are performed in the current state. After the model design, detailed design should be carried out, and English signals should be collected and processed. However, English speech signal has interference factors, such as users’ non-standard pronunciation, poor signal collection ability, and other factors, leading to the collection of speech signal information with 100% accuracy. The collected speech signals need to be processed, including frame segmentation, weighting, endpoint monitoring, and windowing [21]. In order to effectively improve the efficiency of the system and reduce the data interference unrelated to the voice signal, the relevant information data should be consolidated. Thus, the parameter characteristics are found and the subsequent calculation is realized. After the spectrum of a speech signal is generated, it is processed by weighting, windowing, framing, etc. Each short-time analysis window can get spectrum information through a fast Fourier transform and then use the Mel filter to obtain the MFCC two-dimensional graph. The method described above is used to extract relevant speech signal parameters, considering the requirements of intelligent English speech recognition.

4. Experimental Analysis of Model English Translation

4.1. Validation of Model

In order to fully verify the validity of the intelligent recognition model for English translation, an experiment is carried out to test the model for English translation proofreading. The data in the experiment process are recorded, and the system performance is analyzed. In the experiment, there are 400-character proofreading vocabulary, 500 short-text proofreading numbers, and 25 kB/s word recognition speed. The model balances the noise by learning different samples so that the set parameters can enhance the recognition ability to a certain extent. A sequence of operations reduces the likelihood of trained overfitting. In the experiment, the model is applied to the main task by paying more attention to the hidden layer information and mining the hidden mode under different training levels through the way of the whole-domain weight sharing. The accuracy of English translation before and after proofreading is compared with that before proofreading. The accuracy of English translation before and after proofreading is shown in Figure 3.

As can be seen from the figure, the highest accuracy of English translation results before proofreading is 75.1%, and the accuracy is as high as 99.1% after using the intelligent recognition of the module in the paper. The accuracy of the two is quite different, which verifies the effectiveness of the intelligent recognition model of English translation in the system. The scores of both models increased as the data set increased. At the current translation location, the prediction accuracy of dynamic word alignment is obtained after the correct translation alignment position is given, and the prediction accuracy is improved compared with the dynamic word alignment method. In fact, this is the characteristics of the language itself between different languages. For example, there are many auxiliary words of formal subjects in English. When translating into Chinese, we often add some auxiliary words in order to make the sentence coherent. The words that make the language flow do not have an exact translation, so we are okay with that. The picture shows the true alignment of the phrase list. Some phrases even contain punctuation marks, which are the result of incorrect alignment and need to be filtered out. This is due to incorrect alignment due to internal reasons of the tool’s own algorithm, external reasons used, etc. The strategy adopted in this paper is to use the Bing dictionary for filtering. The neural machine translation model under shared weight multitask learning has better performance than traditional models under both low- and high-resource conditions.

4.2. Model Recognition Accuracy

Experimental phrases and network random sentences were evaluated for phrase recognition. The corpus was randomly divided into three parts. The first was a training corpus of 90,000 sentences. The second was a development corpus of 8,000 sentences. The third was a test corpus of 1,000 sentences, and the detailed experimental results are shown in Figure 4. BLEU (Bilingual Evaluation Understudy) is a string matching algorithm used for digitally standardized evaluation of translation quality in cross-language machine translation, which provides relevant personnel with basic translation quality measurement indicators through a BLEU score. BLEU metric is the main index to evaluate the output translation quality of the machine translation system. The standard algorithm idea is to compare the machine-translated text to be tested word for word with the reference text, and reflect the similarity between the two sentences in the form of score. The score ranges from 0 to 1, and the BLEU score is proportional to the similarity.

According to the test results of the graph, the machine translation based on the improved GLR algorithm is the best of its kind in recognition accuracy, recognition speed, and updating ability. The highest score was 93.2 points based on the improved GLR algorithm, and the lowest score was 75.4 points based on the statistical algorithm. There was little difference between the dynamic memory algorithm and the improved GLR algorithm in the final test, and the main difference between the two was in the update ability. The comparison experiment in this paper also adopts the experiment on actual translation cases, selecting sentences for translation, and finally obtaining the experimental comparison results of machine translation and human translation based on the statistical algorithm, the dynamic memory algorithm, and the improved GLR algorithm, as shown in Figure 5.

It can be seen from the figure that the MACHINE translation of the GLR algorithm is the correct translation. In terms of translation, machine translation based on the GLR algorithm is the closest to human translation. It can be seen clearly that the statistical machine translation algorithm based on the improved GLR algorithm is more accurate than the dynamic translation memory algorithm. Its recognition accuracy can reach more than 95.5%, which is equivalent to the level of human translation. The comparison results show that the improved GLR algorithm is efficient and feasible in intelligent English translation.

4.3. The Model Identifies the Distribution of Nodes

In order to fully display the advantages of the design model, the intelligent recognition model based on syntax and phrase is used to realize the comparative experiment. The simple word set model is used as the statistical method to construct the matrix. For some experimental parameter settings in the matrix model, the optimal settings explored in the recommendation task of academic papers on the matrix model are adopted. The distribution of system identification node control points is shown in Figure 6.

The compact distribution of section control points in the figure on the left indicates that the system has high recognition performance and more accurate proofreading results, and solves the problem of contextual incoherence in English translation. In the figure on the right, the syntactic and phrase-based recognition system has a loose distribution of node control points but has a compact distribution of node control points in the first, fourth, and fifth experiments, indicating that the system has a high calibration accuracy, but the coherence of translation results is poor. The proportion of stop words that are easy to predict in translation increases, and the prediction of content words is greatly affected, which reduces the improvement of prediction accuracy of content words. The node control points of the system appear as loose distribution and compact alternating transformation, which indicates that the system is not stable. In this paper, the intelligent recognition model of English translation is designed with high proofreading accuracy, which can identify the problem of incoherence in English translation results and give the translation results consistent with the coherence and rationality of context.

4.4. English Translation Model Grammar Evaluation

There are about 1000 types of grammatical errors in English translation for the statistics; in the original annotation corpus, the classification of grammatical mistakes is detailed, but the number is too high, so according to the classification, the standard of annotation was summarized and the corresponding syntax error types, calculation, and the syntax error types of tag number, model checking errors, and correct error are shown. According to the error correction results, the accuracy rate of each syntax error type is calculated, as shown in Figure 7.

As can be seen from the figure, this model has a good effect on correcting errors of articles and determiners, singular and plural noun, verb forms, and modal verbs, especially for subject-verb agreement errors and verb absence. Prediction accuracy of stop words in the target language is based on the model and dynamic word alignment. The prediction accuracy of stop words is always high. The number of stop words is only a quarter of the number of content words, and the number of stop words is limited and more predictable than the number of content words. This is mainly due to the expansion of the feature extraction range, and the improved cluster search method can obtain more accurate inference results in decoding. In order to obtain the optimal probability threshold of the improved cluster search, a test experiment was conducted on the selection of the threshold. When the value of P was set to 0.95, it could maximize the retention of a high accuracy rate while considering more candidate results.

5. Conclusion

With the continuous progress of globalization, the frequency of English usage has increased greatly, and English plays an indispensable role in the development and communication of science and technology, economy, politics, culture, and other fields. The bilingual corpus can accurately mark the parts of speech of short words, improve the accuracy of the automatic phrase recognition algorithm in English-Chinese intelligent translation from the perspective of optimal selection, and assist the recognition ability of the English-Chinese intelligent translation model. The syntactic and phrase-based recognition system has a loose distribution of node control points but has a compact distribution of node control points in the first, fourth, and fifth experiments, indicating that the system has a high calibration accuracy. Judging from the recognition accuracy, recognition speed, and updating ability, the machine translation based on the improved GLR algorithm is the best of its kind. With the increase in data sets, the performance of the English translation model is superior to that of the traditional model under both low- and high-resource conditions.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the School of Foreign Languages (Office of International Exchange & Cooperation), Xinyang Agriculture and Forestry University.