Abstract

In order to improve the effect of Chinese-English machine translation, this paper combines attention mechanism and neural network algorithm and applies it to Chinese-English machine learning translation. Moreover, this paper uses Gaussian distribution instead of chi-square distribution to analyze the approximate error introduced by the Chinese and English speech energy detection method. In addition, this paper studies the overall and specific approximation errors by establishing the normalized mean square error function and the absolute error function, respectively. Finally, this paper proposes a new model for machine translation based on logarithmic position representation and self-attention mechanism. Through the experimental research, it can be seen that the Chinese-English machine translation model integrating attention mechanism and bidirectional neural network proposed in this paper has a good practical translation effect.

1. Introduction

In the process of processing Chinese and English languages, it is relatively easy to split English sentences [1]. The hardest part is reorganizing the simple sentences that have been split up, and it is easy to be too limited by the grammatical structure of English to make bold changes. At this time, nouns in the original text are translated into nouns in the target language, verbs are translated into verbs [2], and adverbs are translated into adverbs. Even, sometimes an attributive clause, we will copy the structure of the original text. In this way, the translated Chinese can only be obscure and incomprehensible, the words do not convey the meaning, and do not conform to the Chinese language habits. Moreover, the Chinese translated in this way has a strong translation cavity. Therefore, in fact, we should try our best to avoid falling into the trap of translation cavity [3]. One of the important translation methods is part-of-speech conversion, and almost any part-of-speech in the original text can be converted into other parts of speech when it is converted into the target language, such as verbs into nouns, nouns into verbs, prepositions or adverbs into verbs, and nouns into adjectives. The conversion of parts of speech is a necessary means of doing a good job in translation. It can make the translation break away from the structure of the original text and not be bound by the original text. Therefore, the translated text will naturally avoid translation traces as much as possible.

This paper combines the attention mechanism and neural network algorithm and applies it to Chinese-English machine learning translation to improve the effect of Chinese-English machine translation and provide a reference for the development of subsequent artificial intelligence translation systems.

The convolutional neural network proposed in [4] achieves the best performance on object classification in translation datasets. With the further development of artificial intelligence translation, most of the studies using convolutional neural networks to learn high-level semantic features in intelligent translation have been proposed to complete specific tasks such as classification [5]. Among these methods, most of them use repeated several layers of convolutional networks, followed by a pooling layer after each convolutional network, and finally select more important features and output them through a fully connected layer [6]. Reference [7] proposed another classic convolutional neural network, which achieved good results in translation localization and classification tasks, and brought a new reference standard for the structural design of deep models. Reference [8] used a smaller convolution kernel and designed the model to be deeper, so that it could better learn the high-level semantics of intelligently translated content. Reference [9] deepens the depth of the model and improves the overall performance. Generally speaking, if the convolutional network wants to improve the expression learning ability, it mainly depends on increasing the number of output channels, which will consume a lot of computing resources and easily cause overfitting. Reference [10] uses a convolutional branch network to organize information across channels, which not only improves the performance of the model but also greatly reduces the amount of parameters that increase sharply due to the deepening of the depth and width of the model.

With the development of neural network research, some studies have begun to build neural topic models to represent local intelligent translation semantics, that is, object-level intelligent translation content topic models are derived from natural language processing and can discover or learn a class of statistical models for abstract topics in documents [11]. Reference [12] proposed an unsupervised neural topic model document neural autoregressive distribution estimator, which fuses a topic model and a neural network, assuming that the generation of each word is only related to the words generated before it and directly builds. The modulo document is the product of the conditional probabilities of all words, and each conditional probability is generated using a feedforward neural network to obtain object-level topic features with high-level semantic information. In the literature [13], the DocNADE model was introduced into the field of translation understanding, and it was proposed to do translation classification and annotation based on the DocNADE model, and it also achieved good performance. The translation understanding work based on deep learning has achieved fruitful results, shortening the translation semantic gap to a certain extent, but there is still some room for improvement. Reference [14] deepened the network depth to an unprecedented 152-layer , directly reducing the translation classification error rate on ImageNet to less than 5%, approaching or even surpassing the human recognition accuracy rate to an unprecedented peak of development. With the help of deep learning, the task of object-level analysis has also achieved great breakthroughs. Object detection is the basic method of object-level analysis, and it has also entered a stage of rapid development. In the past ten years, the traditional machine intelligence translation field usually uses feature descriptors for object recognition tasks, and the progress is slow [15]. Due to the excellent performance of convolutional neural networks on ImageNet, the literature [16] generalized the ability of AlexNet to recognize objects based on ImageNet training to the field of object detection, which is transfer learning. The proposal of R-CNN is a milestone in the application of convolutional neural networks to the field of target detection, and it is the pioneering work of using deep learning for target detection. After this, a series of far-reaching object detection models based on R-CNN, and regression methods based entirely on deep learning [17]. It is precisely with the help of the deep learning model that under the training of a large amount of ImageNet data, it has the ability to better capture the high-level semantics of intelligent translation content, so that the model can also achieve ideal results when migrating to other tasks or datasets [18]. Transfer learning lays the foundation for the improvement of most tasks in computer intelligent translation. Not only in the field of object detection but also in other computer intelligent translation tasks, transfer learning is also widely used, especially in the task of mining intelligent translation semantic information, including translation-based translation description and translation question answering, as well as video-based video, description, video understanding, and video question answering. As a result, the research progress of intelligent translation semantic understanding task has been greatly promoted [19].

3. Intelligent Chinese and English Audio Spectrum Perception

The principle of Chinese and English sound waveform energy detection is mainly based on the chi-square distribution theory. The Chinese and English sound waveform energy of the received signal of cognitive users obeys the chi-square distribution:

Among them, represents the signal-to-noise ratio of the received signal of the cognitive user, and , respectively, represent the central chi-square distribution and the noncentral chi-square distribution, both of which have 2TW degrees of freedom, and the parameter of is . For the simplicity of the description, is uniformly used in the following text to represent the time-bandwidth product TW in the above formula [20].

In the additive white Gaussian noise (AWGN) channel, the following important probability formulas can be obtained according to the above probability distribution functions, and these formulas can also express the spectrum sensing performance of cognitive users:

The detection probability represents the probability that a cognitive user can correctly detect the presence of an authorized user. The high detection probability indicates that cognitive users can accurately perceive the presence of authorized users, to maintain the sensing state to continue detection or release the spectrum in use and reduce interference to authorized users.

The false alarm probability represents the false alarm (type 1 error) of the cognitive user when the authorized user does not exist, that is, the probability that the cognitive user thinks that the authorized user exists at this time. The low probability of false alarm indicates that cognitive users can more accurately perceive spectrum holes, so as to perform dynamic spectrum access.

The missed alarm probability represents the false alarm (type 2 error) of the cognitive user when the authorized user exists, that is, the probability that the cognitive user cannot detect the existence of the authorized user. For the same cognitive device, the probability of false alarm and probability of detection add up to a constant 1.

and in formula (4) represent the gamma function and incomplete gamma function, respectively, and in formula (3) represents the generalization function. The in the above two formulas is the threshold value of the Chinese and English sound waveform energy detector.

Many research methods on cooperative spectrum sensing are based on a centralized cognitive wireless network, that is, the network consists of a decision center and cognitive users, and each cognitive user reports its own spectrum sensing results to the decision center. The decision center fuses the above information using information fusion criteria such as Chinese and English sound waveform energy fusion or decision fusion, to make a final decision to determine whether an authorized user exists. Decision fusion strategies can be mainly divided into the following categories:

“Majority” rule (majority-rule): if more than a certain number of cognitive users believe that authorized users exist, that is, the local decision is , then the final decision of the decision center is .

AND-rule: when all cognitive users in the network need to decide , the decision center decides .

OR-rule: as long as there is a cognitive user in the network to decide , the decision center decides .

This research will combine the existing Chinese and English phonetic waveform energy fusion and decision fusion criteria to propose a two-layer data fusion mechanism. The “or” criterion is used as the fusion criterion in the upper-layer decision fusion, and the method is also used as the comparison object in the simulation analysis. The cooperative spectrum sensing performance of cognitive networks using the OR criterion is listed below as [21].

In the formula, represents the detection probability, false alarm probability, and missed alarm probability when cognitive users in the cognitive network perform cooperative sensing, respectively. is the same as the above and represents the local detection probability, false alarm probability, and missed alarm probability of the th cognitive user .

In the traditional Chinese and English sound waveform energy detector, the cognitive user first passes the received signal through a band-pass filter, takes out the signal of a certain target frequency band, and then, performs a square operation on it. After that, it accumulates the energy value of Chinese and English sound waveforms in a period of time through the integrator and finally compares it with the preset threshold value (single threshold), to make a local decision on whether the authorized user exists. The single-threshold detection model is shown in Figure 1(a).

in the figure represents the energy of the English sound waveform in the received signal of the cognitive user , and the decisions and of the existence of an authorized user are determined by comparing the Chinese and English sound waveform energy with the threshold value . When , it is decided as , when , it is decided as .

In the double-threshold detection model, each cognitive user compares the energy of the Chinese and English sound waveforms in the received signal according to the preset two thresholds and in the Chinese and English sound waveform energy detector, so as to make a local decision, as shown in Figure 1(b). According to the above description, the local decision criterion of this method is as follows:

Decision:

Decision:

Delayed decision:

When the energy of the English sound waveform in the received signal obtained by is located in the interval , thinks that the energy value of the Chinese and English sound waveform is relatively modular, and it cannot determine whether an authorized user exists or not according to its size. Therefore, the value is reported to the decision center as it is. After obtaining the local decision results reported by all cognitive users in the cognitive network and the English sound waveform energy in the received signal, the decision center performs two-level information fusion to obtain the final decision result of the existence of authorized users. The specific spectrum sensing methods and information fusion rules are as follows:

first performs local spectrum detection within its own range and obtains the output of the Chinese and English phonetic waveform energy accumulator. If satisfies or , it is decided as or , respectively. If is located in the delayed decision interval, that is, , the cognitive user does not make a local decision result, but uploads to the decision center and waits for its delayed decision. From this, it can be seen that the local perception result reported by the cognitive user to the decision center consists of two kinds of information: the local decision and the energy value of the Chinese and English sound waveforms, namely,

In order to facilitate the theoretical derivation of detection performance in the following, and without loss of generality, it is assumed here that the decision center receives data packets reported by all cognitive users, including and . Then, the decision center uses these - values and uses the Chinese and English sound waveform energy fusion method to obtain the delayed decision :

The in the expression of the delay decision represents the threshold value obtained by the decision center according to the user’s communication requirements and the required false alarm probability set by the spectrum sensing performance. From the cognitive user perception process described above, it can be seen that this method considers that users among all cognitive users cannot make local decisions based on the energy of the English sound waveform in the received signal output by the English sound waveform energy accumulator. Therefore, these cognitive users report the Chinese and English sound waveform energy values of the received signal directly to the decision center without processing, and the decision center uses a large amount of historical perception data in the past to make a delayed decision based on the Chinese and English sound waveform energy values. It is equivalent to merging Chinese and English phonetic waveform energy on these values and replacing the unreliable local decision results of cognitive users. The Chinese and English sound waveform energy value obtained by the decision center obeys the following chi-square distribution:

Among them, represents the summation of the signal-to-noise ratios of the - cognitive users who reported the energy values of the Chinese and English sound waveforms. Finally, according to the local decision results of cognitive users and the above-mentioned delayed decision , the final decision obtained by the decision center using the “or” criterion for decision fusion is:

Among them, corresponds to , indicating that the decision center finally determines that the authorized user exists. On the contrary, it corresponds to and determines that the authorized user does not exist.

Based on the dual-threshold Chinese and English sound waveform energy detection method discussed above, the spectral sensing performance is analyzed theoretically. Similar to the previous article, is still used here to represent the local detection probability, false alarm probability, and missed alarm probability of itself. Here, two probability values and are used to denote the probability of that the energy value of the Chinese and English speech waveforms falls into the delay decision interval under the two assumptions of the existence of an authorized user or not, so as to facilitate the subsequent discussion. Its expression is:

Therefore, the aforementioned local spectrum sensing performance of cognitive users can be modified as:

The same symbol is still used here to represent the detection probability, false alarm probability, and missed alarm probability when all cognitive users in the network cooperate:

It can be seen from the above formula that the probability value of between and under the two assumptions of the existence of authorized users has a very important influence on the spectrum sensing performance of the dual-threshold Chinese and English tone waveform energy detector. When and are both equal to zero, the method proposed in this paper degenerates into the traditional single-threshold Chinese and English sound waveform energy detection method. At this time, all cognitive users in the cognitive network report their local decisions to the decision center. Moreover, the decision center also uses the “or” criterion to perform decision fusion on all local decision results based on only this kind of data, so as to obtain the final result.

According to the perception process and realization principle of the aforementioned dual-threshold Chinese and English tone waveform energy detector, this part will use the method of simulation analysis to analyze and discuss its detection performance in detail. The simulation is presented in the form of ROC (receiver-operating characteristic curve) and CROC (complementary receiver-operating characteristic curve). Two simulation scenarios are set here; the main difference is the difference of the parameter, which are, respectively, set as: and.

The comparison objects in the two scenarios are the traditional single-threshold detection methods when , and the rest of the parameters are set as follows:

We first examine the curves in Figure 2. When is small, the dual-threshold detection method improves the spectrum sensing performance less compared to the traditional method. When and are further increased, as shown in Figure 3, the improvement of spectrum sensing performance by this method becomes more significant. At this time, the probability that the energy value of the English sound waveform in the received signal falls into the delay decision interval is greater (0.1) than the first scenario. By carefully observing Figure 3, it can also be seen that when is equal to , the detection probability of this method can obtain a gain of about 1.23 times compared with the traditional single-threshold detection method. Also, the gain in detection probability can be further increased as the false alarm probability decreases.

This paper studies the approximation error introduced by using Gaussian distribution instead of chi-square distribution in Chinese and English phonetic waveform energy detection methods. The overall and specific approximation errors are studied by establishing the normalized mean square error function and the absolute error function, respectively. Moreover, the reasonable lower limit of the degree of freedom of the chi-square distribution when the above approximation is performed is given by numerical simulation, and the corresponding chi-square distribution probability value when the approximation error is the smallest is listed.

Spectrum sensing technology based on Chinese and English speech waveform energy detection has the advantages of low cost and simple deployment, making it increasingly a research hotspot in this field. In these related studies, it is necessary to use the chi-square distribution, a mathematical tool. To analyze the performance of the Chinese and English sound waveform energy detectors: the central chi-square distribution corresponds to the hypothesis , and the noncentral chi-square distribution corresponds to the hypothesis .

The English sound waveform energy in the signal observed by the receiver should obey the following chi-square distribution.

Among them, represents the energy value of the English sound waveform in the signal received by the cognitive user, and and are the central and noncentral chi-square distributions with 2TW degrees of freedom, respectively. The noncentral parameter of the latter is , corresponding to the signal-to-noise ratio of the cognitive user receiver. For brevity of the description below, we use for the delay-bandwidth product 2TW and for the degrees of freedom of the chi-square distribution, and we obtain:

If is assumed to be large enough (this condition is easily satisfied when multiple cognitive users perform cooperative sensing), then according to the central limit theorem, should tend to obey a Gaussian normal distribution as follows:

Among them, represents the approximate value of the English sound waveform energy in the received signal, and represents the Gaussian normal distribution with mean and variance .

According to formulas (24) and (25), we can obtain the cumulative probability distribution functions (CDF) of and under two assumptions:

Among them, represents the gamma function. From this, we can define the following absolute error function and normalized mean square error function:

In formulas (28) and (29), represents the number of variables , and represents the subscript of each . It can be seen from the expressions of AEF and NEF that NEF masks the influence of the variable on the error, so NEF only depicts the approximate trend of the approximation error as a function of degrees of freedom. In contrast, AEF reveals precisely the exact effect of both and on the approximation error.

As mentioned earlier, NEF does not reveal the exact effect of and on the error when acting at the same time, but only shows the approximate variation law of the error for different variables . Since there are two assumptions involved in spectrum sensing technology, the presence or absence of signals from authorized users (corresponding to and ), the errors in these two cases will be analyzed separately in the following sections.

The expression for NEF at is rewritten as follows:

It can be seen that Equation (30) contains a fraction with two square sum functions, and each function contains an integral (see Equations (26) and (27)). In particular, contains the expression of the gamma function, so the theoretical analysis of the shape of the NEF function will be very complicated. In order to find another way, we use numerical calculation to discuss the NEF, and the parameters in the simulation are set as:

The expression for NEF at time is rewritten as follows:

The difference from the previous assumption is that NEF at is a binary function of degrees of freedom and a noncentral parameter representing the real-time signal-to-noise ratio of a cognitive user. Therefore, in this subsection, we will discuss the impact of the above two variables on NEF.

Similar to the previous, we also investigate the morphology of NEF by numerical calculation. The parameters in formula (32) are set as:

In addition, according to the recommendation of IEEE 802.22 WRAN, since the signal-to-noise ratio of the received signal of the cognitive user is very poor when the signal of the authorized user exists, the noncentral parameter is set to -20 dB, -10 dB, and 0 dB.

First, we rewrite formula (28) at to simplify the following discussion, where is fixed at 3 and is not explicitly expressed in the formula:

Since both and are cumulative probability distribution functions, they have limits of the form:

Therefore, the limit of AEF is:

Formula (36) shows that when is quite large, the absolute error between the central chi-square distribution and the Gaussian distribution is negligible.

Figure 4 shows both the absolute error value and the probability value of the central chi-square distribution using a dual vertical axis plot, with the variable on the horizontal axis. The relationship between the above two can be clearly seen from the double vertical axis graph, and the absolute error will gradually become smaller than that shown in the above graph, especially in the area containing this special probability value. It can be seen from this that replacing the central chi-square distribution with the Gaussian distribution can obtain the best approximation effect at the value of the corresponding false alarm probability.

We rewrite formula (28) at time as follows to simplify the subsequent discussion. Among them, is used as a parameter (set to -20 dB to represent poor channel conditions), and is fixed at 4 (take a stricter lower limit than 3 in the previous section to show the difference). Moreover, it is not explicitly expressed in the formula:

Since both and are cumulative probability distribution functions, they are both monotonically increasing functions of and have limits of the form:

Thus, the limit of ARF is obtained as:

Formula (39) shows that the absolute error between the noncentral chi-square distribution and the Gaussian distribution can be ignored when is quite large.

Figure 5 shows both the absolute error value and the probability value of the noncentral chi-square distribution through a dual vertical axis graph, and the horizontal axis is the variable . The relationship between the above two can be clearly seen from the double vertical axis graph, and the absolute error will gradually become smaller than that shown in the above graph, especially in the area containing this special probability value. It can be seen that replacing the noncentral chi-square distribution with the Gaussian distribution can obtain the best approximation effect at the value of the corresponding detection probability.

4. The Chinese-English Machine Translation Model Integrating Attention Mechanism and Bidirectional Neural Network

The Chinese-English machine translation model constructed in this paper adopts the classic encoder-decoder architecture, and the internal structure is shown in Figure 6.

In this paper, a new model for machine translation based on logarithmic position representation and self-attention mechanism is proposed, as shown in Figure 7. In the encoder, there are self-attention combined with log position representation layer and fully connected FFN network layer. The decoder has self-attention combined with log position representation layer, encoder-decoder attention layer, and fully connected FFN network layer. The output layer contains a linear transformation layer and a Softmax fully connected layer.

On the basis of the above research, this paper verifies the effect of the Chinese-English machine translation model that combines the attention mechanism and the bidirectional neural network proposed in this paper. The translation effects of the Chinese-to-English and English-to-English translations of this model are evaluated through multiple sets of translation experiments, and the experimental results obtained are shown in Table 1 and Table 2, respectively.

From the above research, it can be seen that the Chinese-English machine translation model combining attention mechanism and bidirectional neural network proposed in this paper has a good practical translation effect.

5. Conclusion

There is a huge difference in language structure between Chinese and English. English focuses on form, while Chinese focuses on meaning. The grammatical structure of English is rigorous and in order, and the master-subordinate relationship is clear and clear. It is no exaggeration to say that sometimes, a long English sentence is mostly a compound sentence, and it is several lines long. Moreover, the structure is complex, including main clauses, adverbial clauses, attributive clauses, appositions, adjectives, and adverbs. However, the syntactic structure of Chinese is loose and flexible, and different meanings can be generated only by random collocation. This paper combines attention mechanism and neural network algorithm, applies it to Chinese-English machine learning translation, improves the effect of Chinese-English machine translation, and proposes a new model of machine translation based on logarithmic position representation and self-attention mechanism. Through the experimental research, it can be seen that the Chinese-English machine translation model combining attention mechanism and bidirectional neural network proposed in this paper has a good practical translation effect.

Data Availability

The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares no competing interests.

Acknowledgments

This study is sponsored by the Henan Polytechnic University.