Abstract

In the context of globalization, the international common language is English and it is more and more widely used. However, there is a shortage of talents with high English proficiency in all industries. According to the demand of the Chinese market, we should pay attention to students’ English language ability. Deep learning is a machine learning method based on feature learning and feature hierarchical structure, which simulates the analysis and learning of the human brain, carries out feature transformation layer by layer, trains a single-layer network each time, and transforms the features of samples in the original space to a new feature space, thus making classification or prediction easier. There are still some defects in the current machine translation (MT) results of online MT, especially the disadvantages of low efficiency and low accuracy of MT when the full-text range uses the server to compare the data of different languages to obtain the grammar and text-related rules among different languages. Therefore, other modern intelligent recognition technologies should be adopted to achieve accurate English MT. Generally speaking, the closer the training data is to the domain of the target text, the higher the quality of sentence alignment and the more sentence pairs, the more helpful it is to learn more accurate translation rules, so as to obtain a more robust translation.

1. Introduction

In the context of globalization, the international common language is English, and it is more and more widely used. However, there is a lack of talents with high English ability in all industries. According to the needs of the Chinese market, we should pay attention to students’ English language ability. At present, translation product developers need to evaluate the quality of automatic MT of translation products and analyze the use effect of translation products; users need to know which translation product has excellent automatic MT quality, so as to decide which translation product they need to use [1]. In English language ability, English ability is the key. If you want to use English in daily communication practice, you should strengthen English pronunciation correction and learning. In the past, it was completely taught by teachers, which not only required a lot of human resources but was also difficult to implement. Therefore, it is particularly important to develop an intelligent recognition model for English translation. Therefore, MT technology has a huge market application demand and has a good development prospect. The previous MT technology has more or less disadvantages. The low accuracy of translation is a huge bottleneck hindering the further development of MT technology [2, 3]. The evaluation of all translation products focuses on a fixed standard to evaluate a certain attribute of the product. There is no fixed evaluation standard to evaluate the quality of automatic MT. Therefore, it is difficult to evaluate the quality of automatic MT with high precision [4, 5]. There are some problems in the traditional intelligent recognition model and setting of the English translation, which cannot accurately recognize English translation and cannot correct students’ English pronunciation but also mislead students’ pronunciation and affect English learning.

At present, some methods for evaluating the quality of automatic MT have the problem of low accuracy. In order to effectively solve this problem, this paper introduces the deep learning algorithm. In fact, although neural network technology has been well known by most researchers for a long time, deep learning technology did not receive widespread attention until 2000 [6, 7]. The main reason is that researchers have found a method that can effectively train multilayer neural networks [8, 9]. In the training process, the simple encoder will compress arbitrary length source language sentences into fixed length vectors, so longer sentences will face the problem of information loss. In the machine intelligent recognition algorithm based on deep learning, the phrase part of speech recognition is particularly important. The deep learning algorithm can complete the high-precision extraction of language vector features of machine-translated translations by learning and training bilingual words in machine-translated translations and can realize the high-precision evaluation of the semantic quality of machine-translated translations [10]. First, deep learning achieved great success in the fields of speech recognition and image recognition and then quickly set off an upsurge in various research directions of natural language processing [11]. MT methods mainly include rule-based and statistics-based translation methods. At present, the research hotspot is the MT method based on statistics. Generally speaking, statistics-based MT needs to build a large parallel corpus, and the MT model of large-scale languages is based on large-scale parallel bilingual corpora. However, due to the relative shortage of minority language resources, it is difficult to build a large-scale bilingual corpus, which also restricts the application of the statistical MT model in minority language MT [12, 13].

The quality evaluation model of automatic MT based on a deep learning algorithm uses a deep learning algorithm in both feature extraction and quality evaluation to realize high-precision extraction of language features after automatic MT and high-precision evaluation of translation quality [14]. Generally speaking, the closer the training data is to the domain of the target text, the higher the quality of sentence alignment and the more sentence pairs, the more helpful it is to learn more accurate translation rules, so as to obtain a more robust translation. With the development of modern intelligent recognition technology, there are many intelligent MT tools. At present, there are still some defects in the MT results of online MT, especially after using the server to carry out comparative learning on different language data in the full-text range, it can obtain the grammar and text related laws between different languages, which has the disadvantages of low efficiency and low accuracy of MT. Therefore, other modern intelligent recognition technologies should be used to realize accurate MT of English [15].

Literature [16] proposed that, in the actual MT product testing links, such as Baidu and Google translation software, the translation results are quite different from the actual professional manual translation quality, which exposed that some MT levels have been unable to meet the current translation needs, and the market urgently needs an MT technology with high performance and high translation accuracy. With the development of computer technology and through the big data analysis method, there are many translated text databases [17]. The rule-based system is difficult to effectively use new resources to automatically improve the performance of the translation system. Therefore, rule-based MT is gradually replaced by new methods. Literature [18] studies show that the core idea of computer-aided translation is as follows: the results of translation are usually regarded as auxiliary references, and finally users judge the advantages and disadvantages of translation and make a manual selection; on the other hand, the use of corpus can classify and sort out the vocabulary in various industries, improve the quality of translation and be closer to the actual needs of users. Literature [19] proposed that in order to solve the problem of Chinese passive voice in MT, taking Google translation and Youdao translation as examples, the quality of MT of Chinese passive voice was evaluated. Through the big data analysis method, literature [20] shows that storing bilingual phrase data in the corpus can accurately label the part of speech of short words in Chinese and English, standardize the function of each phrase, and greatly improve the accuracy and timeliness of phrase automatic recognition algorithm in English-Chinese MT. It has the advantages of gradual transmission and accumulation, resulting in low translation accuracy. Literature [21] shows that the statistical machine method considers the correspondence between the source language and target language as a probability problem. The statistical MT method regards any target language sentence as a possible translation candidate of the source language sentence, but the probability of different candidates is different. Literature [22] proposes that English text MT conforms to the characteristics that the unit belongs to the period Bureau; the words in short sentences can be divided by tagging the content of the phrase corpus. Corpus plays an important role in the intelligent English translation model. Literature [23, 24], through the big data analysis method shows that the MT system based on statistics is obviously superior to the other two methods in robustness and scalability. It can naturally deal with language ambiguity, quickly build a high-performance translation system from the existing corpus, and automatically improve the translation performance when the language increases. Literature [25, 26] show that the rule-based MT model uses the English MT model based on a semantic network. In the concrete implementation process, the English MT method of phrase synthesis semantic statistics based on vector mixing is used. In the measurement process of the translation similarity model, the processing method uses man-machine active communication to interact directly. Local training optimizes the sorting of local translation candidate fragments on each node of the training data forced decoding tree. Literature [27] proposed an example-based MT method. This method starts from the existing translation experience and knowledge and translates the new source language sentences by analogy and other principles. The example-based method divides the sentences in the source language into phrases that have been seen in translation knowledge and then matches the obtained phrases with experience knowledge by analogy and other methods to obtain the translation of the phrases and then splicing the translated phrases into sentences in the target language.

Based on deep learning, this paper studies the design of intelligent recognition English translation model, uses the phrase center to design and improve the phrase structure in the algorithm, analyzes the syntactic function of the linear table, corrects the ambiguity of part of speech recognition structure and English Chinese structure, solves the problem of low accuracy of recognition results in traditional calculation methods, and provides a reasonable method for phrase recognition. The lexical semantic similarity and log-linear model based on HowNet are designed. Through an in-depth study of English and Chinese MT phrase corpus, this paper comprehensively distinguishes the tenses of different phrase corpora and labels English and Chinese phrase corpus; the corpus marking method includes three parts: level, data, and processing.

3. Principle and Model of Deep Learning

The quality evaluation model of automatic MT based on deep learning algorithm uses a deep learning algorithm in both feature extraction and quality evaluation to realize high-precision extraction of language features after automatic MT and high-precision evaluation of translation quality. In the language information extraction method of automatic MT based on deep learning, the learning and training stage consists of an unsupervised learning stage and a supervised learning stage. In the unsupervised learning stage, the bilingual semantic features of the two natural languages before and after translation are obtained by learning and training the bilingual words at the same time through the noise reduction automatic coder. Deep learning is a very popular topic and has made good progress in many machine learning research fields. In fact, although neural network technology has been well known by most researchers for a long time, deep learning technology did not receive widespread attention until 2000. The main reason is that researchers have found a method that can effectively train multilayer neural networks. The training process usually includes two steps: the first step is unsupervised pretraining layer by layer, and the second step is supervised parameter tuning. Generally speaking, the closer the training data is to the domain of the target text, the higher the quality of sentence alignment and the more sentence pairs, the more helpful it is to learn more accurate translation rules, so as to obtain a more robust translation. Because there is generally a nonlinear relationship between the original input signal and output, in order to achieve better learning results, traditional learning methods need to explicitly or implicitly transform the original input signal into a feature space that is approximately linearly separable from the output. However, the standard information of language corpus is imported into bilingual words to realize the fine-tuning of bilingual semantic features of two natural languages and optimize the effect of language vector feature extraction. Deep learning has been paid more and more attention in the field of natural language processing and has been gradually applied to various tasks of natural language processing. However, natural language processing has its own characteristics. Generally speaking, it is different from speech and image processing in the following two aspects:(i)In the process of speech and image processing, input signals can be naturally expressed in vector space, while natural language processing is usually carried out at the vocabulary level. Converting independent words into vectors and taking them as the input of a neural network is the basis of applying a neural network to natural language processing.(ii)In natural language tasks, we usually have to deal with various recursive structures. Language, part-of-speech tagging, etc., need to process the sequence, while syntactic analysis, MT, etc., correspond to a more complex tree structure. This structured processing usually requires a special neural network structure.

The overall process of the English translation model based on deep learning and intelligent recognition is shown in Figure 1.

The functions of the English translation intelligent recognition model are planned and the overall model is designed. This model can realize data collection, output, and processing. The data acquisition device is used to collect the voice signal, and then the audio input device is used to input the English signal into the processing system to process the data signal. The processed results are output in the corresponding client and displayed. The user can view the automatic recognition results of English translation through the display or client. The model design flow chart is shown in Figure 2.

In order to optimize the reliability of unsupervised learning, a noise reduction automatic coding machine is used to implement unsupervised learning of bilingual words. Before reconstructing the a vector and B vector in sample y, a certain degree of noise is introduced to the sample pair (Ye > YC). The vector after the noise is introduced into the sample pair (Ye > YC) is . As an encoder of two languages, the noise reduction automatic coder can obtain the implicit expression Ke, KC of natural language a and natural language translation result B through sigmoid activation function coding.

Here, and R are coding function and sigmoid activation function, respectively; the translation matrix parameters of mutual transformation between A and B are VE and VC, respectively; and VE and VC are bilingual words with their own language characteristics. Because there is no difference between the dimensions of kE and kC, the encoders of natural language A and natural language B share an offset value β.

After obtaining ke and KC, the implicit expressions of the two languages are decoded in turn by using a noise reduction automatic coder. For the implicit expression of kE in natural language A, two kinds of decoders of natural language A and natural language translation result B are implemented in this paper: decoding KE into the reconstruction vector of natural language A and the reconstruction vector of natural language translation result B in turn.Here, Gθ Is the decoding function and De and DC are decoder offsets of two natural languages.

The error of two natural language vector pairs (Ye, YC) reconstituting the original vector pair is . The loss function o is

In the unsupervised stage, the decoding function of unsupervised learning is set as  = {VE, VC, dE, dc}. The gradient descent algorithm is used to update the decoding function , so that the loss function O reaches the minimum value, and VE and VC are trained.

In the deep learning network, unsupervised training is implemented from top to bottom, each layer is set as a restricted Boltzmann machine, the weight of each layer is trained by the greedy learning method, and the training is layered from bottom to top. The first layer and other layers are modeled as Gaussian-binary restricted Boltzmann machine and binary-binary restricted Boltzmann machine, respectively. In the restricted Boltzmann machine, there is no connectivity between each visible node and hidden node, and its conditional probability distribution θ1 and joint probability distribution θ2 areHere, m () and logistic () are Gaussian density function and logic function, respectively; the offset of the visual layer uj is fj; t1 indicates that the number of hidden layer nodes is 1; j = 1, 2, 3; and ε represents the standard deviation.

4. Design of Intelligent Recognition English Translation Model

4.1. Design of Intelligent Recognition English Translation Model Based on Deep Learning

The corpus used in the intelligent translation model based on deep learning plays an important role. The corpus can be used to store bilingual phrase data, accurately label the part of speech of short words in English and Chinese, standardize the function of each phrase, improve the timeliness and accuracy of phrase automatic recognition algorithm in the process of English-Chinese MT, and make English-Chinese MT more accurate. Such an optimization goal can make the parameters of a neural network reach a better region, but it is still not trained according to the end-to-end translation performance. On the basis of local training, we will further carry out supervised global training to directly optimize the performance of the model to generate the final translation results. If we can give clear domain labels of test data or training data, use domain labels to process various data by categories, and then train the translation models of each field, respectively; even if the test data is replaced, we only need to classify the test data and then select the translation model for translation according to the domain category. This is more suitable for maintaining statistical MT system and is conducive to data accumulation and long-term planning. We extract words, word pairs, part of speech tagging, and other features from the translated phrase pairs. It can deal with a large number of grammatical ambiguities of phrases, sentences, and words. The data type is text format. The level uses alignment and part of speech. In the process of translation, the semantics contained in the phrase of a sentence is usually the core content of the sentence, and the intelligent identification of the phrase is an important link in language recognition. Its principle is to identify and summarize the phrases in the sentence, then analyze the part of speech and syntax of the phrases, translate and automatically combine them against the phrase corpus, and finally get the translation result of the original sentence. Compared with the great breakthrough in the field of speech and image, the research of deep learning in the field of natural language processing is still in the exploratory stage, but it has also achieved certain results. In the translation of scientific and technological documents, the training data usually comes from various sources and fields, which can not be completely consistent with the fields of the target text to be translated. In order to optimize the translation cost and quality of the target text, the problem of “domain adaptation” arises.

Through an in-depth study of English and Chinese MT phrase corpus, this paper comprehensively distinguishes the tenses of different phrase corpora and labels English and Chinese phrase corpus. The corpus marking method includes three parts: level, data, and processing. The data type is text format. The level uses alignment and part of speech. The processing method uses man-machine active communication to interact directly. A series of routine operations of English translation is carried out to improve the accuracy of phrase corpus translation. Local training optimizes the sorting of local translation candidate fragments on each node of the training data forced decoding tree. Such an optimization goal can make the parameters of a neural network reach a better region, but it is still not trained according to the end-to-end translation performance. On the basis of local training, we will further carry out supervised global training to directly optimize the performance of the model to generate the final translation results. In the field of MT, phrase intelligent recognition is the key technology, which can meet the tone selection of translation samples and the accurate alignment of parallel corpora. Using phrase intelligent recognition technology can effectively reduce grammatical ambiguity. A simple encoder decoder can solve the problem of MT, but it also has some shortcomings. A large number of sparse features are helpful to characterize the translated phrase pairs. Here, we take arbitrary sparse features as part of the representation of translated phrase pairs. Here, for high-frequency translation phrase pairs, we take them as a feature. We extract words, word pairs, part of speech tagging, and other features from the translated phrase pairs. In the training process, the simple encoder will compress arbitrary length source language sentences into fixed length vectors, so longer sentences will face the problem of information loss. In the machine intelligent recognition algorithm based on deep learning, the phrase part of speech recognition is particularly important. It can deal with a large number of grammatical ambiguities of phrases, sentences, and words. The words in short sentences can be divided by tagging the content of the phrase corpus. Corpus plays an important role in the intelligent English translation model. Storing bilingual phrase data in a corpus can accurately label the part of speech of short words in Chinese and English, standardize the function of each phrase, and greatly improve the accuracy and timeliness of the phrase automatic recognition algorithm in English-Chinese MT.

4.2. Experimental Results and Analysis

Based on the design of the English translation model for intelligent recognition by deep learning, we compare the influence of different numbers of retrieved documents and the length of the hidden layer on the accuracy of the translation system. We find that for most results, the best translation accuracy is achieved when the number of retrieved documents is N = 10. At present, there are still some defects in the MT results of online MT; especially after using the server to carry out comparative learning on different language data in the full-text range, it can obtain the grammar and text related laws between different languages, which has the disadvantages of low efficiency and low accuracy of MT. This is because with the further increase in the number of retrieved documents, irrelevant documents will be introduced into the learning of the neural network. Three experiments were conducted for comparison, as shown in Figures 35.

The experimental results show that when l is small, the accuracy of the translation system is high. In fact, when l ≤ 600, the difference in translation performance is very small. However, when l = 1000, the translation accuracy is worse than in other cases, mainly because the amount of parameters in the neural network is so large that it cannot be learned well. At present, the research hotspot is the MT method based on statistics. Generally speaking, statistics-based MT needs to build a large parallel corpus, and the MT model of large-scale languages is based on large-scale parallel bilingual corpora. English translation computer intelligent proofreading system is actually an English translation process. However, due to the relative shortage of minority language resources, it is difficult to build a large-scale bilingual corpus, which also restricts the application of the statistical MT model in minority language MT. Deep learning is a very popular topic and has made good progress in many machine learning research fields. Domain adaptive methods of English translation can be divided into five categories: methods based on data selection, methods based on a hybrid model, semisupervised learning methods represented by self-learning, methods based on the topic model, and methods based on domain label. Through automatic domain classification of English sentences in training data, development set and test set to be translated, the domain categories of sentences can be obtained, and then the domain label set of the test set or training set can be generated. The domain label set is used to filter the training data, so as to ensure the consistency of the domain. This study tests this method on the statistical MT system and obtains the translation effect comparable with the original training data by using only part of the training data. With the development of modern intelligent recognition technology, there are many intelligent MT tools. Therefore, other modern intelligent recognition technologies should be used to realize accurate MT of English. MT methods mainly include rule-based and statistics-based translation methods.

Using this method and the traditional syntactic analysis MT method, the BLEU value is calculated by test set, respectively, in which the BLEU value is an automatic evaluation method of MT, and the higher the value, the better the quality of MT. Two experiments were conducted, respectively, and the results are shown in Figures 6 and 7.

The experimental results show that the BLEU value of the proposed method is basically the same as that of the traditional parsing MT method when machine translating simple sentences, while the BLEU of the proposed method is slightly higher than that of the parsing MT method when machine translating general sentences; especially when machine translating complex sentences, the BLEU of the proposed method is much higher than that of the parsing MT method. Therefore, it shows that the advantage of this method is the MT of complex sentences. There are a large number of highest noun phrases in complex sentences. This method adopts the statistical MT method based on the idea of maximum entropy, which can obtain the best combination of different English language features in complex sentences, eliminate some structural ambiguity and improve the accuracy of English MT. We take phrase based statistical MT as the baseline system and feedforward neural network as the framework to study the neural network language model. Many scholars have studied the neurolinguistic model based on words and proved its effectiveness. However, phrase based statistical MT is a combination of phrases when generating the target translation. Therefore, the language model with phrases as the basic unit is also worth studying. In the training stage, the translation rules from the source language phrase to the target language phrase are automatically extracted from the parallel corpus aligned with bilingual sentences, and its probability is learned. In the translation stage, the source language sentence is divided into phrase sequences, and the phrase sequences of the target language sentence are obtained by using the translation rules, Then, with the help of the phrase reordering model and language model, the phrase sequence of target language sentences is sorted, and finally, the best target translation is obtained.

5. Conclusions

The great advantage of phrase-based statistical translation system compared with a word-based statistical translation system is that the translated phrase pair, as the basic unit of translation, cannot be modeled by vocabulary level information. A simple encoder decoder can solve the problem of MT, but it also has some shortcomings. A large number of sparse features are helpful to characterize the translated phrase pairs. The training process usually includes two steps: the first step is unsupervised pretraining layer by layer, and the second step is supervised parameter tuning. Using phrase intelligent recognition technology can effectively reduce grammatical ambiguity. It can be said that the existing methods are still weak in processing combinatorial semantics and cannot accurately describe complex recursive language structures, which affects the performance of MT systems. Introducing the existing knowledge base into deep learning, improving the optimization function of deep learning, and increasing the discrimination weight of the knowledge base will also improve the efficiency and accuracy of the domain discrimination technology of intelligent recognition English translation model in deep learning in statistical MT. The level uses alignment and part of speech. In the process of translation, the semantics contained in the phrase of a sentence is usually the core content of the sentence, and the intelligent identification of the phrase is an important link in language recognition.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest.