Abstract

Accuracy of deep learning model translation is a key index to evaluate the application performance of engineering English translation. In this paper, an automatic error detection system for English translation is proposed. In the particular task of grammar detection, researchers have gradually shifted their attention from statistical methods to neural network methods. Three deep learning algorithm models are established, and the multitask performance of the model is better than that of the conditional random field model and the LSTM-CRF model. The reason is that the multitask learning model of auxiliary tasks is included to some extent, which solves the problem of data sparsity and enables the model to be fully trained even under the condition of uneven label distribution. Thus, it performs better than other models in the task of syntax error detection. It realizes the word spelling error check based on the dictionary and uses the thought of editing distance to prompt the word error found, which can automatically check a large number of translations. On the basis of analyzing the sentence structure characteristics of engineering English translation, this paper realizes the detection of subject-verb agreement errors and analyzes the main word of the subject corresponding to the predicate verb by constructing the syntactic structure tree of the sentence, so as to realize the judgment of subject-verb agreement errors.

1. Introduction

Due to different language environments in each country, language barriers greatly hinder the communication between different countries [1]. In order to break the language barrier and strengthen the communication between countries, engineering English translation came into being. Translation simplifies the process of communication between people from different countries. Nowadays, English is the most frequently used language in the world, and engineering English translation has the most application scenarios and a wider range of applications [2].

Engineering English translation detection and diagnosis is essentially a special speech recognition task. The input is the same as speech recognition, a piece of audio. However, its output is different from speech recognition; that is, it needs to output the corresponding phoneme of each audio frame, and the output of speech recognition is a paragraph of text. After outputting the corresponding phonemes of the audio, the detection and diagnosis model usually needs to compare the algorithm with the standard phonemes corresponding to the target statement, so as to achieve the purpose of detecting and diagnosing errors. The acoustic model, language model, and decoder are constructed, respectively. Although the accuracy rate is good, its defects are also obvious [3]. First, the construction of multiple modules requires specialized domain knowledge. Second, because each module is individually trained, its errors will accumulate and add up. In addition, the complexity of multiple modules makes it difficult to migrate new systems or data. Secondly,it has good expansibility.It does not need to design each module individually, but only needs to design the overall network junction.

On the one hand, in view of the above problems, this article puts forward the deep learning method for qualitative evaluation of the quality of translation strategies; namely, translation gives a rough classification of “good” and “bad,” and then, the task can be targeted for sampling, which can find more effective translation errors. In this way, the influence of relatively simple translation tasks can be avoided, and the overall quality of translation batch tasks can reasonably be evaluated to ensure translation quality more effectively. On the other hand, this paper also adopts some strategies for automatic inspection and discovery of translation errors, which can be checked twice before the submission of the end results to effectively find the detectable errors in the end sampling translation or to recheck the translation with poor quality in the automatic evaluation.

Literature [4] proposed an automatic English text judgment algorithm, which first splits and filters and then extracts optimization and interactive fusion, and designed a BP project English translation evaluation system. After machine evaluation and teachers’ independent evaluation of the same English sentence sample, the test results show that the ETSS system has an excellent performance. However, the system has low accuracy and a high misjudgment rate in automatic detection of English translation errors. The research on automatic engineering English translation detection technology was first carried out on foreign social platforms such as Twitter. Literature [5] mainly studies the engineering English translation detection technology on the Twitter platform. By observing the characteristics of Twitter platform, they select four types of features for detection, whether it contains question symbols, the proportion of positive words and negative words in the message, whether it contains engineering English translation symbols and whether it is to forward the microblog, etc. The characteristics based on users include registration duration and the number of followers, fans, and tweets. Features based on engineering English translation include the proportion of contained in all tweets under engineering English translation, average emotional score. Propagation-based features include the depth of the forwarding structure tree which is formed by the forwarding relationship and the number of original posts translated from engineering English. Then, 15 features with the most distinguishing ability are selected from these features and classified by the J48 decision tree [6].

Literature [7] for the first time proposed the deep learning model for engineering English translation. In the paper the method based on RNN, event related is modelled as a variable length of time series, used for all learning event on the semantic changes over time, and then using RNN variant LSTM and GRU helped to further improve performance. Literature [8, 9] proposed a novel recursive neural network model based on soft-attention, which can capture semantically time-varying relations published over time under the same English translation project and generate hidden representations. Then, the attention mechanism is used to make the model focus on the more important parts of the representation for engineering English translation detection to automatically perform engineering English translation detection. Literature [10] provided a novel deep RNN model. Literature [11] proposed a bidirectional tree-like recursive neural network model, with one direction being top-down and the other direction being bottom-up. This model is used to learn and classify the representation of the communication structure of engineering English translation. The results on two public Twitter datasets show that the model not only has better performance but can also show the ability to find engineering English translation at a relatively early stage. Literature [12] proposed a style approach inspired by generative adversarial networks [13], in which generators are used to generate uncertain or conflicting noises. The authors designed two generators: one is used to distort nonengineered English translations to make them look like engineered English translations and the other is used to “whitewash” engineering English translations to make them look like nonengineered English translations. The generator’s enhanced data are used to force the discriminator to learn more distinguishing features from low-frequency nontrivial modes. Literature [14] proposed an end-to-end model similar to generative adversarial network style in order to remove the specific features that are not transferable due to specific engineering English translations and retain the shared features among all engineering English translations. It includes a feature extractor, an engineering English translation detector, and an engineering English translation discriminator. The feature extractor is used to extract text and visual features, which are connected together to form the END multimodal feature representation. Both the engineering English translation detector and the Engineering English translation discriminator are based on the feature extractor. The engineering English translation detector takes learned feature representation as input to predict whether it is true or false, and the engineering English translation discriminator identifies each engineering English translation label based on the joint representation.

Machine translation evaluation and the development of machine translation are complementary to each other. Machine translation evaluation is one of the core issues of translation quality evaluation [15]. In recent years, machine translation evaluation has developed rapidly, and its quality has been receiving a lot of attention from people due to the rapid development of computational linguistics. Because people have different requirements for machine translation evaluation, many methods of machine translation evaluation have emerged. From the perspective of evaluation types, it can be divided into operational evaluation, illustrative evaluation, and classification evaluation [16]. Operational evaluation is mainly used to evaluate the economic value of a translation system, which is a good reference index for consumers. Illustrative evaluation uses evaluation translation to evaluate the performance of the translation system, which is usually subjective. The classification evaluation method can test the translation results of the system through different language phenomena, so as to point out the shortcomings of the system and the direction of improvement, so it is very suitable for researchers and developers [17]. The results of such measurements are often of interest to other machine translation researchers, and they are concerned not only about the performance improvement but also about the reason for the performance improvement. At the same time, this evaluation also strengthens the technical communication between researchers [1820]. However, due to the limitations of machine translation, the current translation quality is hardly comparable to that of human translation. Therefore, machine translation is only applied in certain limited fields under the special needs of users [21], and the corresponding automatic evaluation of machine translation is also based on the given reference translation. The evaluation results are obtained by calculating the similarity between candidate translation and reference translation. The original engineering English translation model faces a serious problem; that is, no matter what the length of the source language sentence is, it is encoded as a vector of fixed dimensions. The proposed attention mechanism [22, 23] effectively solves this problem. The basic principle of the attention mechanism is that in the process of translation in the decoder side of the engineering English translation model, the current hidden state of each word in the source language is considered in addition to using the fixed dimension vector of the generated source language sentence. In the process of decoder operation, the decoder will dynamically look for the related source language vocabulary and add the context information contained in the vocabulary into the operation process of the decoder [24]. Therefore, the attention mechanism changes the way of information transmission and can dynamically calculate the source language context most relevant to the current decoding words, thus effectively solving the problem of long-distance information transmission and significantly improving the translation effect of engineering English translation. Therefore, the encoder model based on the attention mechanism has become the mainstream method of engineering English translation and has been widely used.

3. Automatic Error Detection of Engineering English Translation Based on Deep Learning

3.1. Automatic Detection Module for Engineering Translation Errors

From the perspective of grammar, syntax, and word order, the posterior probability of words f is calculated by using the maximum entropy classifier e. The expression is as follows:

The selected voice sensor is used as the main device to collect and store the output speech signals of engineering English translation, as shown in Figure 1.

The ratio of the number of matched N-tuples to the total number of n-tuples of candidate translation is calculated. At present, there are mainly two evaluation techniques for n-gram automatic translation; one is a BLEU evaluation standard proposed by IBM and the other is an improved scheme based on the BLEU standard proposed by NIST, which is called NIST evaluation standard. DARPA uses NIST-based automated translation evaluation tools for its machine translation evaluation in the TIDES program. BLEU is an automatic evaluation method for machine translation based on n-element grammar. The overall evaluation of BLEU is shown in the following formula:

Deep learning engineering English translation error automatic detection is mainly by specialists in the target language according to their own professional knowledge, and integrated with reference to the source language from the fluency of the translation (statement) fluid, loyalty (whether the translation faithfully express the meaning of the text) and accuracy (accuracy of syntactic or semantic) three aspects to measure the quality of high and low, Fluency can reflect the overall quality of a translation. In practical research and application, the evaluation of fidelity of translation is much more difficult than that of fluency. They divided intelligibility and loyalty into five grades respectively, among which the loyalty grade is as follows:(1)The content of the translation is basically consistent with that of the source text.(2)Translation needs to reflect the content of the original, with very few modifications.(3)The translation of the original text is basically faithful, but there are some limitations such as improper word order, inaccurate choice of meaning, improper usage of tenses, the relationship between phrases, and singular and plural nouns, adverbial positions, and other errors, which need to be processed carefully by post-translation editors.(4)Some of the original text is carefully translated, while some of the original text is not translated, and thus, the structure of the original text cannot be entirely translated, leading to many preposition errors, wrong phrase structure, clause judgment errors, content loss, and other phenomena.(5)Translation basically cannot reflect the content of the original text, failing to be translated in many places, or even if complete or relatively complete, most of the translation is unintelligible and can hardly constitute a complete sentence.

3.2. Design of the Deep Learning Algorithm Model for Automatic Translation Error Detection in Engineering English

Feature engineering is the process of learning and extracting features from text or images and other data sources. These features use relevant knowledge in the data field to achieve the best performance of the deep learning algorithm. Feature engineering is the process of extracting features from raw data that can better describe the data in a specific domain. Selecting appropriate feature engineering can greatly improve the performance of the model, but the more features selected, the better. Therefore, choosing good features can not only simplify the model but also reduce the running time. In natural language processing, part-of-speech features and syntactic features are often used to transform the original data, and then, appropriate feature functions are constructed to improve the performance of the algorithm. One advantage of the CCF (collaborative computing framework) model is that it can define more and more kinds of characteristic functions. In this study, a large number of feature functions were constructed by combining part-of-speech features and syntactic features as the inputs of the conditional random field model to achieve better performance.

In this study, the language technology platform (LTP) is used to carry out part-of-speech tagging on the text corpus. The LTP uses part-of-speech tagging set 863, and the meanings of each part of speech are shown in Table 1.

Meanwhile, CRF is used to model and solve the dependency relationship between tags. Lastly, the softmax layer and the CRF layer are combined at the output end. Figure 2 shows the frame diagram of this model. Under the frame of this model, the LSTM (long short-term memory) layer is used to calculate the feature fraction in the CRF(conditional random field) layer, which is called neural feature. These neural features are similar to traditional sparse CRF features, which are directly used to calculate scores for a given tag sequence.

Dynamic programming can be effectively used for calculation and inference of optimal tag sequences. Then, the modified CRF layer models the conditional probability of the possible output sequence S on the input sequence X as

The algorithm of automatic error detection is shown in Table 2.

Multitask learning can improve the performance and generalization ability of the model on each task by constructing auxiliary tasks or joint tasks and solving multiple tasks by sharing the main parameters of the model. Because of the correlation between tasks, multitask learning is equivalent to implicit data enhancement. Figure 3 shows that the output layer of the model is divided into a mother tongue classification layer and a phoneme sequence annotation layer by means of hard parameter sharing, and the main coding module of the model plays a role in the form of shared parameters. The model can learn the phoneme sequence tagging task and mother tongue classification task simultaneously, so the model can effectively learn the phonetic features of different mother tongues and improve the generalization ability of the model in the sequence tagging task.

The global sample representation vector is calculated by the weighted sum of all hidden states. Lastly, the whole sample representation vector C is input into the full connection layer of native language classification to obtain the end classification result.

In the comparison of multitask mispronunciation detection and diagnosis model experiment, the following two aspects are mainly compared. On the one hand, the classification layer of the mother tongue is the full connection layer (MT-D) or the attention mechanism layer (MT-A). On the other hand, the phoneme sequence annotation layer either uses full connection or postprocessing network (MT-D-P, MT-A-P) as described in the previous section.

It can be seen from Table 3 that in the multitask model, the attention mechanism has the best effect on the classification layer of the mother tongue. In the comparison between the postprocessing network and full connection, we find that the postprocessing network can significantly improve the accuracy of model sequence annotation, but the accuracy of model mispronunciation detection and diagnosis is not significantly improved. It can be seen that in the multitask model, too many output levels of a single task will negatively affect the generalization effect of multitask.

4. Example Verification

The acquisition of experimental datasets is also one of the main links in the preparation of the experiment. The spoken Arabic digit dataset is selected as the experimental dataset, which contains a large amount of English translation data. In order to ensure the accuracy of experimental conclusions, multiple groups of experimental data were set, as shown in Table 4.

The design system and the BP neural machine English translation automatic judgment system are adopted to automatically detect translation errors in English translation using deep learning and verify the accuracy and misjudgment rate of automatic translation error detection of the two systems. The test results are shown in Figure 4.

As shown in Figure 4(a), the correct rate of automatic translation error detection obtained by the application design system is up to 100%, while the correct rate of automatic translation error detection obtained by the BP neural system is only 80%. As shown in Figure 4(b), the misjudgment rate of deep learning automatic translation errors detected in English translation by the design system is less than 10%, which is lower than that of the BP system. This indicates that the application system has higher detection accuracy and a low misjudgment rate, and the automatic detection of translation errors by deep learning in English translation is better.

500 pieces of translated text were used for testing, among which 1020 were spelling errors. The scheme described was used to check spelling errors. The experimental results are shown in Figure 5.

It can be seen from the experimental results that the correct rate of nonword error detection is relatively high, reaching 87.6%, when the dictionary size is limited by using the dictionary-based method to check spelling errors and to stem words that do not appear in the dictionary. It can be seen that simple nonword error detection is highly feasible, and true word error detection is also tested in this paper. However, due to the difficulty of test set construction, experimental results are not given here. The conclusion is that it is difficult to achieve a high accuracy rate for true word error detection.

In the comparison experiment, the optimal labeling results were obtained in both feature template T9, and the Fa + TH value was 5.6% higher than the maximum flag, see Figure 6.

It does not normalize at every node but globally normalizes all features so that it can get the global optimal value and its performance is better than that of the maximum entropy model. In addition, the deep learning engineering English translation error detection model of multivariate composite characteristics also has better convergence ability and T7 has templates, T8 and T9 are respectively introduced the compound characteristics of different, join compound characteristic experiment results also increased slightly, the article choose correctly, the recall rate and F values respectively reached 79.2%, 79.5% and 79.4%. It shows that the deep learning engineering English translation error detection model can make full use of multilevel resources and has a good ability to describe long-range associations.

In addition, as shown in Figure 7, the error rejection rate of the best pronunciation fit evaluation algorithm is only 22.95%. It can be seen that L2-Arctic is still a challenging data set, because it contains the data of English spoken by people from different native countries, which results in a lot of difficulty distinguishing pronunciation in the audio. Acoustic models of unsupervised mispronunciation detection are trained only on standard pronunciation and are not good at detecting unfamiliar mispronunciation.

On the other hand, this study also found that there are some defects in the algorithm for detecting incorrect pronunciation. As shown in Figure 8, when users read the word “ROOM,” they correctly read the “R” phoneme but incorrectly inserted the “G” sound. However, the premise of mispronunciation detection is that the user only pronounces the standard phoneme corresponding to the word, so in forced alignment, the “R” phoneme will be given a low score due to the existence of the “G” sound. This kind of feedback can be confusing to users because there is nothing wrong with the “R” sound but just the insertion of the wrong sound. The main goal of the mispronunciation diagnosis task is to output the phoneme sequence of the actual pronunciation of the user and compare it with the standard phoneme sequence so as to bring the correct feedback to the user.

The task of phoneme sequence annotation can also be regarded as the task of classification on each audio frame. The classification results of all audio frames in the test set were statistically analyzed to form the confusion matrix as shown in Figure 8. It can be seen that the classification of standard phonemes is basically accurate, but the classification result of the error tag is very poor. Because the error tag is an additional tag in the L2-Arctic data set, it represents nonstandard English pronunciation phonemes in all languages. It is considered that the error tag is too broad for the deep learning model, so it is difficult for the model to learn effective information.

5. Conclusion

Deep learning of engineering English translation is the most frequently used deep learning of translation at present, but the probability of translation errors is still high. The detection effect of the detection system is poorer; therefore, a new engineering English translation deep learning error automatic detection system is designed. through the experiment data show that the design system of translation error detection accuracy is higher and misjudgment rate is low, that the system design is feasible and translation for the future development and application of deep learning provide certain help and support. The method of deep learning is adopted to classify the translation according to sentence error types, and different penalty weights are given for different error types. In the next step, the translation is scored according to the deductive criteria of manual scoring. Finally, the translation is qualitatively evaluated according to the scoring result and minimum value. This method can reduce the impact of simple translation on the overall quality of batch tasks to some extent.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This work was supported by the Scientific Research Program and funded by the Shaanxi Provincial Education Bureau: Xi’an Tour Text Translation Strategy Research in Terms of Prototype and Model Theory (Program no. 18JK0298).