Abstract

With the increased demand for English translators in recent years, more scholars and researchers have begun to focus on college English translation education. The demand for computers to perform automatic natural language translation is at an all-time high, presenting another potential for machine translation (MT) research. Although neural machine translation (NMT) technology has advanced significantly, it still has several flaws. The reply NN-based MT technology is not optimal in the field of long sentence translation, and there are situations such as missing translation and overtranslation. So this paper proposes college English translation based on a convolutional neural network (CNN). This paper proposes that CNNs have stronger feature extraction and information processing capabilities than other deep NNs. It successfully solves the situation of missing translation and overtranslation, lowering the quality of college English translation instruction. The experimental results of this research reveal that when the length of an English sentence grows longer, the translation accuracy based on the reply NN drops to 47.9%, while the translation accuracy based on the CNN is 89.7%. It can be seen that college English translation based on CNN can be integrated into English translation teaching as an innovation of traditional teaching methods, so as to create a learning environment for students and enhance learning effects. MT based on CNN has greatly improved students’ performance, which is conducive to improving the quality of college English translation teaching.

1. Introduction

English classroom teaching is one of the tasks that must be broken through. In today’s international financial and commercial exchanges, the requirements for comprehensive English application ability are constantly increasing. Among them, college English translation ability plays a key role. In recent years, the demand for English translation talents has been continuously increasing, and the problem of translation education has gradually attracted the attention of the academic circles. In college English education, translation education has always been concerned, but its educational effect is not ideal.

As a result, English teachers and scholars are concerned about how to improve students’ translation abilities. This study suggests that English majors be taught to translate using a college English translation approach based on CNNs. However, NMT brings benefits to the field of MT. However, NMT only converts between natural languages using a single NN, which has a number of flaws. This research uses an MT model based on CNNs to increase MT performance.

The translation is determined as a compulsory course for English majors and is considered to be a very practical course. The cultivation of translation ability is one of the comprehensive language abilities. With the constant development of computer information technology, according to Yao, people’s lives have been significantly altered. Computer-aided translation technology (CAT) is currently being used as an important supplementary tool, including attempts to adapt it to the teaching process [1]. Wang studied the creation of English translation training using computer-aided translation software in depth. With the help of corpus statistics software, he performed a complete data analysis of the translation and the original text [2]. Najjar discussed the translation of the Qur’an in Chinese and English and studies the morphological transformation of hyperbolic patterns. He sought to determine the impact of translation strategies on the quality of the translation of research data [3]. In cultural translation, Zhang discovered that people see translation as a cross-cultural communicative activity. In diverse cultural situations, translation thought will likewise shift dramatically. The translation industry places a high value on cultural differences, as evidenced by recent translation studies [4]. Fitriani’s research tried to classify and describe grammatical faults in English-translated sentences in terms of syntax and morphology. The surface strategy taxonomy is used to guide the analytical approach [5]. Susini found that implicit meaning is one of the linguistic phenomena that need to be overcome in translation. His research aims to investigate the structure in which implicit meaning is realized in English [6]. Although scholars agree that English translation is very important, there is a lack of research on how to improve the effect of college English translation.

MT is the process of converting sentences from a source language to a target language using a computer. MT helps people communicate in a variety of languages. Chen proposed the Residual Encoder-Decoder CNN, which was inspired by deep learning (RED-CNN). His proposed RED-CNN model produced excellent results [7]. Kruthiventi found the discovery that understanding and predicting human visual attention mechanisms has received neuroscience attention. Its large receptive field can understand semantics at multiple scales while taking into account the global context [8]. Lu created a deep learning technique that uses graph cut refinement to partition the liver in CT scans automatically. For precise refining of the initial segmentation, it leverages graph cuts and previously learned probability maps [9]. Deep learning is widely employed in a variety of fields and has produced numerous results.

In recent years, scholars have begun to consider from a new perspective of deep learning regarding many of the main modules retained in MT tasks. Many studies have shown that deep learning can effectively solve various bottlenecks of previous MT methods. As a result, NMT received attention immediately after it was proposed, greatly improving its performance in just a few years. The innovation of this paper is to analyze how the CNN plays a role in MT technology, thereby improving the teaching quality of college English translation. Experimental analyses of deep learning-based revertive networks and CNNs are carried out.

2. CNN MT Technology Based on Deep Learning

The ultimate goal of language learning is communication, and translation is one of the most important forms of cross-cultural communication for foreign language students. As a basic skill for comprehensive language use, translation ability is very necessary for English majors. Because the future employment of college students will require them to work in various fields, learn the latest achievements of other countries in various fields, or show China to the world the latest achievements in many fields, so college English translation is crucial for them.

MT is one of the best ideas of scientists and all human beings in the early days of computers. It has generated new vitality under the urgent needs of world economic development and cultural exchange [10]. MT has gone through a tortuous development path and has made great progress. The current NN MT technology includes revertive NN, recurrent NN, feedforward NN, and CNN. However, from the current situation, there are still many problems in MT that need to be solved as soon as possible [11]. The structure of NN MT is shown in Figure 1.

Deep learning is a relatively recent research area in machine learning. It was added to machine learning to help it achieve its original goal of artificial intelligence. Figure 1 shows how deep learning has advanced rapidly in the disciplines of image processing and speech recognition since 2013. Deep learning has recently been discovered to be more effective at reducing the problem of linear inseparability, lack of correct semantic representation, and functional design. It can help statistical MT overcome the challenges of fully utilizing nonlocal context, data sparseness, and error propagation. MT is a popular topic in academia right now [12].

2.1. Algorithm Based on Revertive NN

In the traditional feedforward NN model, the adjacent layers are fully connected, while the nodes in each layer are disconnected. A revertive NN is a network that contains both feedforward and feedback pathways. Its feedforward pathway is similar to the traditional feedforward NN model. The feedback pathway can reflect the output of some neurons to itself at a later time as the input of a new time [13]. The schematic diagram of the revertive NN is shown in Figure 2.

As demonstrated in Figure 2, feed forward NNs are the simplest type of artificial NN. According to the different layers of feed forward NN, it can be divided into single-layer feed forward NN and multilayer feed forward NN. The revertive NN, unlike the standard feed forward NN, pays attention to the timing input. It adds a cyclic circle to the hidden layer, and this cyclic circle signifies that the hidden layer’s output from the previous moment is used as the hidden layer’s current input. The calculation formula of the revertive NN is as follows:

This way of connecting the revertive NN has obvious advantages for the sequence input . The network can better capture timing information and obtain context information. Therefore, the reply NN is especially suitable for the fields of speech, text, and video processing that focus on timing. The field of speech includes speech recognition and language conversion. The text domain includes text classification and MT. The video domain includes video recognition and video classification [14].

It is precise because of the special connection method of the revertive NN that the revertive NN has a serious problem; that is, it cannot solve the long-term dependence. This is due to the continuous multiplication of gradients in the revertive NN when the error is backpropagated. Therefore, when the gradient is less than 1, the gradient is prone to disappear, and when the gradient is greater than 1, it is prone to gradient explosion [15]. The Long-Short Term Memory (LSTM) was created to overcome the revertive NN’s long-term reliance problem, and the network was enhanced over the classic revertive NN. In science and technology, LSTMs have a wide range of applications. And, more are all tasks that LSTM-based systems may learn. Figure 3 depicts the LSTM network.

The LSTM network differs from a standard revertive NN in that the original revertive NN unit is changed into a CEC memory unit, as shown in Figure 3.

To tackle the problem of gradient dispersion, the CEC memory unit’s summation technique allows the gradient to be kept while the mistake is conveyed. In the LSTM network, three gates are introduced, whereas the forget gate determines whether or not the input should be forgotten [16, 17]. The forget gate’s job is to pick which parts of a long-term memory (the output from the previous unit module) to keep and which to discard.

The input gate is as follows:

The forget gate is as follows:

The output gate is as follows:

Here, i is the input gate, f is the forgetting gate, and o is the output gate. After the calculation of the three gates is completed, the memory unit is updated. After the widespread use of LSTM networks, many variants of traditional revertive NNs have followed, including GRU networks [18].

GRU is a well-performing LSTM network variation. It has a simpler structure than an LSTM network and produces excellent results. The GRU network is different from the LSTM network which contains three gates. There are two gates in it: an update gate and a reset gate. The update gate determines the percentage of input and memory in the hidden layer’s output. There are two gates in it: an update gate and a reset gate. The update gate determines the percentage of input and memory in the hidden layer’s output [19].

It consists of two gates: an update gate and a reset gate. The percentage of input and memory in the hidden layer’s output is determined by the update gate.

The update gate and reset gate are calculated by

There are many defects such as missing translation, mistranslation, overtranslation, and so on, which lead to the deterioration of translation results [20].

2.2. MT Based on CNNs

CNNs are a type of feedforward neural network (NN) with deep structures and convolutional computations. It is one of the most representative deep learning algorithms. Then, a convolutional neural-based MT model is proposed for the long sentence translation defect as shown in Figure 4.

As shown in Figure 4, CNNs (CNN) are similar to regular NNs in that they are made up of neurons that have trainable and learnt weights and bias constants. CNN models are usually built on feedforward NN models. The difference is: the “hidden layers” of traditional NNs are replaced by “convolutional layers,” “pooling layers,” and “completely linked layers.” This unique structure enables CNNs to perform exceptionally well as shown in Figure 5.

As shown in Figure 5, the activation layer is followed by the convolution layer, which is similar to the role of the activation function in the NN. It generally uses the ReLu activation function and can also be other activation functions such as Sigmoid.

The calculation formula of the neuron is as follows:

Among them, is the parameter of the convolution kernel.

Some researchers proposed a new model of NMT in 2016 and successfully applied CNNs (CNN) to MT. The Encoder-Decoder model based on the reply NN is similar in concept.

A CNN of the model is responsible for the Encoder encoding to convert the source sentence into an intermediate vector. Another CNN is responsible for Decoder decoding, decoding the intermediate vector to generate the target sentence. The high-level CNN abstractly extracts long-distance information, realizes NMT, and has achieved outstanding achievements, which has brought significant development to the field of MT by CNNs.

2.2.1. Encoder Coding Based on CNN

The translation is divided into two parts: understanding and expression. Encoder encoding is responsible for understanding, and Decoder decoding is responsible for expression. The input layer represents the input to the model. The input of the Encoder model is the sequence number of the word corresponding to the vocabulary after the segmentation and generalization of the source sentence as shown in Figure 6.

As shown in Figure 6, the vocabulary is constructed based on the training data set. The embedding layer refers to the process of representing words as vectors. After the embedding layer, an input sequence number is correspondingly converted into a vector of a specified dimension. Embedding layer is calculated as follows:

The embedding layer uses random initialization to assign the initial value to the word vector. After training, the required word vector is obtained to realize the representation of words with vectors.

The proposal of the hidden layer is proposed along with the concept of a multilevel network, which mainly solves a linear inseparable problem. The bidirectional hidden layer actually contains two layers of network, one is a forward LSTM network and the other is a reverse LSTM network. The bidirectional LSTM network aims to capture the input sentence information more comprehensively from two different perspectives of positive order and reverse order, so as to achieve a better understanding of the input sentence.

To aid in generating the output of the current moment, the current moment calculation is added to the output of the LSTM network at the previous instant. The network captures the above-given information in positive sequence, such as the following formula:

The network captures the above information in reverse order, as in the following formula:

The output of the forward LSTM network is fused with the output of the reverse LSTM network to obtain the output of the bidirectional hidden layer. It captures the context information and obtains the final vector representation, which is calculated as follows:

This vector will serve as the input to the next hidden layer. f takes the concatenation function.

In the discipline of natural language processing, deep networks attempt to extract abstract high-level features of language in order to gain a deeper comprehension of language. The network is calculated using the following formula:

In practical applications, it can be considered to continue to increase the number of hidden layers to obtain higher-level features according to requirements.

2.2.2. Decoder Decoding Based on CNN

The Encoder is pretrained through a NN to determine the initial value of W. Its goal is to make the input value equal to the output value. After Encoder encoding, the sequence of the output of the hidden layer corresponding to the input of the source sentence at each moment is obtained. It uses the attention mechanism network to compute contextual information. After Encoder encoding, the kernel state of the hidden layer is obtained. The Decoder part of the model is shown in Figure 7.

Figure 7 shows that the input of the decoder model is different in the training learning phase and the testing application phase. In the process of model testing, because the correct target sentence is unknown, the input of the model at the first moment is still a special word, which represents the beginning of the decoding process. The input at each other moment is the final output of the network at the previous moment.

To a certain extent, it is used for dimensionality reduction. The principle of dimensionality reduction is matrix multiplication. In convolutional networks, it can be understood as a special fully connected layer operation. This is the role of the embedding layer. The embedding layer refers to the process of representing words with vectors, which is the same as the embedding layer of the Encoder network and corresponds to different vocabulary. After the embedding layer, an input sequence number is correspondingly converted into a vector of a specified dimension. The calculation of the embedding layer is shown in the following formula:

In the hidden layer, the forward LSTM network is employed, and the number of layers can be modified depending on the situation. Two layers of LSTM networks are hidden in the hidden layer of the MT model’s block principle-based Decoder network. The LSTM network’s kernel is launched by the Encoder network’s second hidden layer’s kernel.

Attention models are commonly employed in natural language processing, image identification, and audio recognition, among other deep learning applications. It is an important technology that ought to be studied in depth. The attention mechanism network is found in the Decoder network’s hidden layer. The output of the hidden layer at the previous moment of the Decoder network serves as the basis for the attention mechanism. It constantly mixes the output of the Encoder network to calculate and provide context information appropriate for the present state of the Decoder network.

For the calculation of the current network context information,

And, all are normalized to get the contribution of the output of the Encoder network at each moment to the calculation of the hidden layer of the Decoder network at the current moment. According to the contribution calculation, the context information corresponding to the current moment is obtained.

The output of the embedded layer at the current moment, the network output is calculated as follows:

Each layer of the multilayer LSTM network is roughly the same, the network structure is the same, and the difference lies in the specific input and output .

The output of the final hidden layer is as follows:

The normalized exponential function is commonly known as the softmax function. In multiclassification, it is a generalization of the binary classification function sigmoid. The goal is to present the findings of multiclassification as probabilities. The output layer is a completely linked layer with an output dimension equal to the target language vocabulary’s dimension vocabulary.

The final output is activated by the Softmax function, as shown in the following formula:

After Encoder encoding and Decoder decoding, backpropagation updates the network parameters, fits the training data, and captures MT features.

2.2.3. Cross Entropy Loss

In classification issues, the cross-entropy loss function is frequently utilized. Cross-entropy is frequently utilized as a loss function in NN classification issues.

In classification issues, the cross-entropy loss function is frequently utilized. Cross-entropy is frequently utilized as a loss function in NN classification issues. In machine learning and deep learning, entropy is a measure of the uncertainty of random variables. Assume an is a discrete random variable with a finite number of potential values, and the following is its probability distribution:

Then, the entropy of random variable a is calculated as follows:

Here, the base of log is 2 or the natural logarithm, and if is 0, the value of is defined as 0. The distance between two random distributions is measured by cross-entropy. Assuming that random variable A has a value set of U and that its two distributions are p and q, the cross entropy of the two random distributions is determined as follows:

In deep learning, cross-entropy is frequently used to calculate the similarity between the network output and the ground truth. It calculates the cross-entropy loss to determine the current error and adjusts the parameters accordingly.

2.2.4. BLEU Score

A variety of automatic evaluation standards for translation technologies have been proposed. At present, the widely used and recognized evaluation standard is to use the BLEU algorithm for scoring and discrimination.

The BLEU algorithm is calculated as follows:where N represents the number of N-gram models, and represents the weight of the corresponding N-gram model, usually 1/N. in the formula represents the matching accuracy of its corresponding model. The BP is shown in the following formula:

The length of the to-be-evaluated translation is C, and the length of the reference translation is r. The length penalty factor BP is connected to the relationship between c and r and is a piecewise function.

Since any n-gram model does not match, the BLEU value in this case is 0, which is meaningless. Therefore, the BLEU algorithm is not suitable for measuring the translation of a single sentence but is suitable for evaluating the translation of many sentences.

3. College English Translation Teaching Experiment Based on CNN

3.1. Comparative Experiment of CNN and Other English Translation Models

The experiment employs Ubuntu14.04LTS as the experimental environment and LuaJIT for NN initialization and NMT model development. The NMT model, which is based on a CNN, is then tweaked and built accordingly. This section mostly compares and examines the outcomes of similar investigations.

The application data come from the machine text translation public dataset of the Global AI Challenge. The original dataset contains a total of 10 million Chinese-English parallel sentence pairs. It limits the source and target language sentences in the training data to no more than 60 words.

The two MT models were trained under different sentence lengths, and the efficiency of the two translation models on the data set was compared to ensure that the parameters during the training process remained unchanged. The comparison experiment of the two models is shown in Figure 8.

As illustrated in Figure 8, MT technology based on the reply NN performs well in short sentence translation. However, the translation quality results in extended sentences that are unsatisfactory. MT technology based on revertive NNs has a severe difficulty with long sentence translation capacity. It has a significant impact on MT quality and is an area that has to be improved.

According to the findings of the experiments, the CNN described in this study can partially solve the problem that NMT is sensitive to sentence length. Simultaneously, the multisequence coding method incorporates lexical and syntactic information into the NN to further guide the generation of translations by using the CNN to encode related sequences other than the source language sentences in parallel. Thus, the translation performance is improved to a certain extent.

In the experiment, the revertive NN and the CNN-based translation model are compared, and the experimental data are used to identify the model’s relevant benefits and characteristics. The following is shown in Table 1.

Table 1 shows the following: the calculation of the output at the present moment is dependent on the output at the previous moment due to the revertive NN’s timing mechanism. Context information stacking in long-term sequences is prone to information loss and information disturbance, resulting in low quality translation results. The statements are dynamically segmented according to the specific conditions of each source statement. Compared with the source sentence, the sentence block successfully removes redundant information interference, and the sentence length is shortened. The network can better understand the meaning that the source sentence wants to express and improve the effect of MT.

The identical English texts are utilized as the test set when the experimental model is created. Corresponding model tests were carried out using different experimental models. It analyzes the quality and performance of the model by comparing the different translations generated by the model. In this paper, the quality of sentences of different length levels is analyzed, and the semantic extraction and semantic expression as well as the fluency of sentences are compared, as shown in Figure 9.

As shown in Figure 9, from the translation effect of the two translation models, the English translation model based on CNN shows better translation quality than other models in terms of semantic extraction and semantic expression of sentences. Moreover, the contextual relationship of related words in the sentence and the choice of translation also show certain advantages. It greatly guarantees the fluency of sentences while delivering the semantics correctly, illustrating the ability of LSTM for deep semantic encoding and long-term memory.

3.2. Comparison Experiment of MT Technology

This study reveals the translation results of MT models using BLEU scores.

Overfitting can be avoided by using dropout (neuron dropout probability). It is the most appropriate dropout value based on the existing data and application context. The model cross-entropy loss and the BLEU score of the training dataset in the first 5 iterations were lower than when the dropout was set to 0.2, 0.5, and 0.7, as shown in Tables 2 and 3.

Dropout can help avoid overfitting to a certain extent, as shown in Tables 2 and 3. Since the number of hidden layers is set to two, the number of model layers is reduced. At the same time, it sets dropout to 0.2 based on comparing experimental results, model training time, and overfitting risk. It limits overfitting to a certain level.

Table 4 shows the experimental findings of the MT model comparison in this paper:

Table 4 shows the cross-entropy loss of each translation model on the training dataset, although the fit is poor. The BLEU scores on the training dataset were similar when each model stopped training. The MT model based on the CNN provides a better translation effect than the regression NN model. The experimental findings objectively indicate the efficiency of the MT model based on CNNs.

3.3. Effect of CNN in College English Translation Teaching Management

This paper conducts experiments on two English major classes A and B. There are 40 students in each class, the time period is half a semester, and the experimental comparison results are midterm exam results. Class A is taught by traditional manual translation, and class B is taught by MT teaching management mode based on CNN.

This paper investigates the students’ preference for the teaching management of MT and traditional manual translation teaching management by two classes, as shown in Tables 5 and 6.

As shown in Tables 5 and 6, for the teaching management mode of MT, there are 35 students who like it very much, accounting for 43.75%. Only 3 students expressed dislike, accounting for 6.0%. But for the teaching management mode of traditional teacher translation, only 10 students expressed that they like it very much, accounting for 12.50%. 14 students expressed that they disliked it very much, accounting for 17.50%. It can be seen that the teaching management of traditional teacher translation is not popular.

This article compares the scores of the two classes after one semester, as shown in Figure 10.

As shown in Figure 10, at the beginning of the semester, the average grades of Class A and Class B were compared, and it was found that the students in both classes were at the same level of translation. However, after a semester of study, their translation skills have improved a lot, especially in English. The application of different teaching methods is the reason for this difference. Therefore, it can be concluded that through the application of this method in teaching, students’ translation level has been fully improved.

4. Conclusions

With the increasing demand for professional knowledge of English translation in society, college English translation education is becoming more important at major schools and universities. The usage of neural networks in English translation systems has been discovered to improve the quality and accuracy of the English translation. Traditional statistical machine translation, on the other hand, can no longer match the needs of modern civilization. Through these analyses, CNNs have been found to help improve the accuracy of MT. The experimental comparison between CNN and the revertive MT model is carried out in the experimental section. The translation ability of CNN is found to be stronger when the sentence lengths are different. Finally, the translation model based on CNNs is carried out for the English translation class, and the teaching comparison is carried out. Finally, it is found that the translation model based on CNNs has good teaching quality. Due to the author’s lack of ability, there are still some flaws in many aspects, and the author strives to do better in the next work.

Data Availability

The data used to support the findings of this study are available from the author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This research study was sponsored by Henan Provincial Medical Education Research Project in 2020. The name of the project is Research on the Path of Empowering the Doctor-Patient Communication and Humanistic Competence of General Medical Students under the Fusion of Narrative Medicine and Curriculum Ideological and Political Vision under Project no. Wjlx2020318. The author thanks the project for supporting this article.