Abstract

Japanese, as a global language, can aid cultural exchanges and improve mutual understanding. When translating news, the translator should focus on the sentence from the standpoint of cross-cultural communication, so that the translation is more accurate and natural. Only the relationship between Japanese and Chinese can help translate more news reports, spread foreign information, and allow them to learn more about foreign events. This paper analyzes corpus using corpus means, fully exploiting the technical advantages of corpus in terms of words, vocabulary, text correspondence, semantic rhyme characteristics, and so on, breaking through traditional translation research with a large number of corpus statistics data. The frequency and richness of vocabulary selection are not as good as in the original text due to the language of language and translation language, but the numerical difference is not large, and the high-frequency vocabulary is essentially consistent, ensuring the core content of the original text. The original sentence will be resolved according to the text content in the sentence and discourse. The Chinese translation day’s original vocabulary is 358 times, and the table’s high-frequency modal words are 1274 times. As a result, the statement with the highest frequency “should” account for 33% of the high-frequency voices used. 332 Japanese original text, the highest frequency verb, a total of 247 times. Make the original text’s and translation’s language characteristics more convincing, as well as the translation and translation strategy selection.

1. Introduction

News is real-time and practical, and this translation is true. In the face of different news, translators need to fully understand the news content, use translation skills to convert their language, and let more people understand news. News translations differ from other types of English translation. It covers a wide range of topics, including political, economic, cultural, diplomatic, and military activities. Many professional terms will be used in the translation. If the translator does not understand these events well enough, the accuracy of news translation may suffer, and the ultimate goal of news English translation may not be realised [1]. Furthermore, news English translation is cross-cultural, influenced primarily by the cultures of various countries. For example, while English is spoken in Australia, India, and the United Kingdom, language expressions differ due to differences in national conditions. As a result, the details of cultural differences are required during the specific translation process, improving the accuracy of news translation [2]. When dealing with people from various cultural backgrounds, the translator should change the word meaning and add a mark when necessary so that the reader understands the news event better. For example, in China, red represents happiness and prosperity, whereas in other countries, it represents violence and danger.

A translation is a text that combines information and propaganda. Its service object is the average reader, and its goal is to provide them with up-to-date information on wind and soil, as well as other geographic, cultural, natural, and other topics. The main features of news discourse from a functional standpoint are transmitted and behavioural induction. China and Western countries have significant cultural differences, which are usually communicated through language. The Chinese and Western cultures have developed their own distinct aesthetic habits as a result of long-term cultural influence. Language intuition and expression are both affected by this aesthetic habit. There are some unique language expressions and objectives reflected in news translations, which give news translation text a distinct style and expression. Japanese, as a global language, can aid cultural exchanges and improve mutual understanding. When translating news, the translator should focus on the sentence from the standpoint of cross-cultural communication, so that the translation is more accurate and natural. Only the relationship between Japanese and Chinese can help translate more news reports, spread foreign information, and allow them to learn more about foreign events.

According to the statistical research of the corpus, in terms of vocabulary selection, the frequency and richness of the source and objectives are not as good as the original, but the value is not large and the high-frequency vocabulary is basically the same, ensuring the accuracy of the original core content transfer [3]. In the selection of semantic rhyme, the translator is a neutral semantic rhyme to select a neutral selection of the language of the language characteristic compared to the original language moon’s tendency. Overall, the node is presented with a positive mixed language moon atmosphere. The reason for the semantic rhyme is that the node will take into account the contextual factors when choosing the match, causing the semantic yaw atmosphere [4]. The analysis of semantic rhyme helps the correct use of the vocabulary, thereby presenting a semantic atmosphere that is more conforming to the language. In the choice of vocabulary, you should use Japanese language and grammatical features, express sentences, rich purposes of interest, and more close to the language characteristics and reading habits of the purpose. The emperor is reflected in the attitude of the speech. The semantic meaning of different modal words differs when expressing attitude and standpoint. When translating, pay special attention to the word translation, especially if emotive words are used. Because the translator is not the same as the original text or the translation, there will be differences in how they understand the original text and how they feel about it. As a result, in accordance with the translation specification, this paper investigates the translation of the corpus closer to the original text.

The translation problem of the functional language can be found through data analysis of the corpus in this article’s innovation point. This paper’s Japanese translation quality assessment can be used as a model for translation quality assessment research, with the corpus being used to assess richness, vocabulary density, and other factors. According to the findings of the study, it is conducive to the discovery of problems in individual translations, which will have some guiding significance for future news translation work.

The corpus usually refers to the language materials stored in language research, which is used in electronic form, and is collected from natural written languages or spoken samples to represent specific language or language variants. After scientific selection and labeling, a properly scale-based corpus can reflect and record the actual use of the language [5, 6]. People observe and grasp the law of language facts and analyze and study language systems through corpus. The corpus has become an indispensable basic resource for linguistic theory, application research, and language engineering.

Based on the text type theory and according to Buhler’s classification of three language functions, Liu divides discourse into information discourse, expression discourse, and infection discourse [7]. Kaur believes that the evaluation of translation quality can be realised according to the functions between the original text and the translation and points out that the objective evaluation of the translation should comprehensively analyze the various factors affecting the translation from the perspective of text type, linguistic elements, and nonverbal elements [8]. Takigawa believes that the two functions of language directly affect the quality of translation. These two factors are considered in parameter setting, that is, whether the translation is equivalent to the original text in the conceptual and interpersonal meaning of the text [9]. Chen proposed the “learning control” model of language, namely, LC model. It is composed of four systems: language automatic control system, language automatic learning system, language knowledge automatic feedback system, and home page and text automatic detection system [10]. Tomyuk et al. believe that the dynamic (circulation) corpus is the inheritance of all the research results of the previous corpus, not reactionary. It is just that the sampled media and texts of the corpus increase the attribute and attribute value of circulation [11]. Zhang has developed a Japanese essay segmentation system, which is developed by combining the Japanese phrase segmentation method based on statistics and CRF+ [12]. In studying the syntactic analysis of Japanese, Cheng et al. proposed a method to analyze the grammatical relationship of Japanese, which is mainly carried out by using the grammatical function marker features of Japanese sentences [13]. Shi has developed a Japanese syllable segmentation and word segmentation system, which adopts regular and statistical syllable segmentation, word segmentation, and part of speech tagging methods [14]. Li and Li proposed a method to extract transliteration equivalent pairs of Japanese and English bilingual named entities from well aligned Japanese and English comparable corpora [15]. Spear proposes a method for extracting bilingual named entity equivalence pairs based on semantic information and speech information matching. This method establishes the matching model of semantic information and speech information of source language and target language. The effect of using this model method is to find some named entity equivalence pairs that are not in the dictionary [16] outside. Fortune one proposes that, in bilingual comparable corpus, when there are named entity equivalence pairs, named entity equivalence pairs have words with the same or similar semantics in the text context. Two words are selected before and after the named entity, and the word is used as the context information of the entity. Then, the similarity of candidate named entity equivalence pairs is determined by calculating the word vector similarity of the context [17]. Kim proposed that when extracting entity equivalent pairs from comparable corpora, we first need to define multiple features, including phonetic features, Chinese-English character features, and context features. Then, the features are linearly fused to calculate the similarity of candidate named entity equivalence pairs. However, this method does not consider that there are key characteristics in different categories of named entities [18].

There is still very little to study the language work of East Asian countries, especially for Japanese research; it is in its infancy. The above method has a good language of the Han Da bilingual entity corpus. It is a synthetic language. In a word, a word is needed to be formed after a combination of words and multiple words.

3. Bilingual Parallel Corpus

3.1. Constructing a Wrong Revolution

The corpus is a collection of language records that is useful for language and translation research. A corpus is a large-scale electronic text library with a certain capacity of text or discourse fragments [19], which is collected using a random sampling method to collect naturally occurring continuous languages. Comparable language libraries, multilingual libraries, and flat language libraries are the three main categories of the corpus. To research translation style, the multilanguage library is used. In order to study the translation text, which provides an effective reference coefficient to the most application potential in translation research, it is primarily used to research deep translation issues, such as translation normative and other parallel language and main language translation. There are word, sentence, and paragraph level parallel languages, as well as one-way and two-way parallel libraries, which are primarily used in translation practices, translation teaching, translation research, translator training, dictionary compilation, and machine translation. It is a corpus that is closest to the translation field [20]. Bilingual Parallel Writing Boards and other corpus the largest differences in the collected corpus type and the corpus processing process involved. Bilingual Parallel Works Collection is the bilingual processes corresponding to the original primitive and the objective, including the word correspondence, corresponding sentence corresponds to or articles, and some parallel language libraries require a typical length corresponding to the length. Correspondence and translation quality will directly affect the quality and construction process of bilingual parallel language libraries. Bilingual Parallel Wrabs. In addition to the collected corpus, format, and markup processing, the most important alignment is required. The precision level of corpus is related to the survival of the entire parallel library.

Use the word confused concentrated word to replace the word of each word in the seed word library; generate the words that may be wrong. The resulting words may be correct words, which requires the correct word in the word generated by the seed word library to obtain the error. The specific construction process is shown in Figure 1.

3.2. Establishment of Bilingual Parallelism Libraries

If there is no special parallel library, or the useful parallel spending value is not high, the self-built bilingual parallel language library is the best choice. Corporate screening and corpus are required before creating a bilingual parallel library. Bilingual text is aligned with the text unit of a language to form a translation relationship or correspondence with the text unit of another language.

Parallel language libraries are a type of language database that stores both the original text and its translation. Parallel language libraries in both languages are mostly found in computer auxiliary translation systems. It examines the computer’s high-speed processing capabilities, analyzes each of the original text sentences to the translation, and then sorts them by matching rate, the original translation of the original matching rate in the library, for the translator’s reference. The translation is finished by the auxiliary translator. The corpus automatically forms a complete match pair with the original text and records in the corpus when the translator is translated into a new translation unit, such as a sentence, and the corpus’ size grows [21, 22]. Parallel language libraries aided the translator’s work primarily. After receiving the original translation version of the matching rate that is acceptable to you, it can be used in accordance with the current situation, or it can be ignored. The corpus will not modify the existing translation or record a new translation if the translator chooses to translate the matching rate’s existing original text. If the translator made changes to an existing translation before using it, the corpus will associate the new original with the translation and keep track of it. The corpus will also provide a translation of the translation with the original text pair and record it in the library for a new original, as shown in Figure 2 and the bilingual parallel language library.

All precisely aligned corpuses with matching rates are recorded in the corpus during this process. The parallel language library keeps track of all the translator’s original texts and translations since the corpus was first used, which is the best evidence of the translator’s translation track [23]. The two-language parallel language library will play an increasingly important role with the translator, whether it is a personal translator or a translation team, as long as it does not change the translation field. When the language interacts with them, it does so for a limited time. It alters the translation to increase the amount, and the translator becomes increasingly reliant on the language library.

Because there will be more cultural differences during the Japanese translation, you should use your own language habits to improve the smoothness of news translations so that readers can understand the information contained in the article the first time. As an effective translation method, translation primarily aims to convert the original sentence structure, sentence, and the like in English to Chinese, while also ensuring that the original text and the original text are consistent. This type of translation allows readers to gain a better understanding of other countries’ national characteristics and language habits, resulting in smaller and more accurate translations and original gaps [24].

4. Building a Computer Auxiliary News Japanese Translation System

4.1. Generality of Words in Terms of Words

There are many vocabularies in language vocabulary but often used in daily exchanges, those who have high frequencies in various space segments in various fields, called universal vocabulary. This paper analyzes the versatility, time versatility, and spatial versatility. The qualitative inspection method is relatively simple, and the distribution of the field of vocabulary is also an eye, but it lacks scientific data support, so this paper selection is from the principle of quantification. Calculate the field of words such as formula (1)

The text usage of the word is as (2)

Among them, represents the relative frequency of the word in the field of , and represents the average relative frequency of the term in all classes. is the total number of texts; represents the spread coefficient of .

Calculate the uniformity of the distribution in various fields, and the calculation formula is (3)

In the above formula, indicates the number of months in the exam, requiring each month of Chinese material library corpus measuring: is the word frequency of the word month.

4.2. Semantic Similarity

Metric semantic similarities can refer to the vector model in the information retrieval. The basic idea of vector space model is to indicate the weight of different feature items of text. The quantity indicates that the text can select word or phrase as feature items. It is generally believed that the word is preferred as a feature entry to use words and phrases as feature items, and the component of the relative word frequency represents vector; the weight calculation method is as follows:

is the total number of texts in the system, and represents the number of texts containing the word . indicates that initial frequency in text refers to the number of times the text appears in text.

4.3. File Alignment

The file alignment can be performed in two ways, which is aligned according to the file attribute; that is, the value of the file number is equal, or the number of work numbers must be considered to be translated into translation. The second can determine the alignment relationship by performing a cross-language similarity. The basic flow of the alignment is shown in Figure 3.

Because of the need to deal with various language files, if the character string extraction method requires a thorough understanding of the language, you will get better accuracy and resolution, but the application surface of the algorithm will be very limited; after all, you can be familiar with a variety of languages. There are a limited number of languages. As a result, a method with widely accepted extraction characteristics is employed. This method assumes that, in addition to the phrase of the article not having the actual meaning of the article, it is to remove the deactivated words, and the higher the frequency of other words is, the more the article can be represented. This assumption is more common for nations with described content. As a result, all that is required is to comprehend the extraction feature string. However, there is a need for assistive word and mean labeling tools in Chinese and Japanese. By labeling the words, the programme can focus on the words with certain words, such as nouns and verbs.

The extraction characteristics differ slightly between languages; first, the words that appear will be analyzed. Because most words in a language like English have clear meaning due to the space between them, there is no need to fourth English. Although Chinese can be considered by words, the words in which nesting characters often have the meaning of words, and in the translation process, Han Da is usually used rather than literal translation. As a result, a word is required for the treatment of Chinese, and the article is divided into words rather than a single consideration. The Chinese Academy’s word-visual interface is being used in this project. Using the word interface, you can obtain the words of each word, as well as the words that do not require consideration, such as pronoun, etc., in order to determine the frequency of the words in the article, based on the word frequency. The article has distinctive string.

5. Translation Quality Assessment

5.1. Semantic Rhyme Analysis Based on Han Da Pa Paper

Semanti Years are important concepts in corpus linguistics research, not to belong to a certain word, but according to the role of context, a meaning atmosphere that is shrouded around the vocabulary, and it covers the previous emotional color. Semantic rhyme expression is a tendency of attitude, not absolute attitude.

In a large quantity of complex and matching text data, it may be difficult to find a feasible analysis, or even no clue can be followed. Then, according to the extraction index of a certain spacing, you can resolve the study of the study, and the extracted index line also has operability and representation. Because the node word selected is noun, its main match is concentrated before and after the word, and the word is associated with adjectives and verbs, combined with Chinese expressions or emblems of different tone components, which will be left and right. The distance is set to 2. The words in Japanese sentences are more flexible, and the translations and original classes are set to L5-R5 compared to the spacing of text. Therefore, the statistics are made by extracting the common matching of the left side of the node word. Determine the significant matching word of the node word, after entering the node word to search for the lowest standard search in the appearance of frequencies, the matching of the symbol in the gauge, MI value, MI means the interaction information value, that is, a word Another word value in the same occurs simultaneously, is simple to match the intensity.

Finally, the semantic rhyme is determined according to the high-frequency match of the node word. Analyze the function of the word criticism, based on the semantic rhyme of the selected word; analyze the original text and translations on the terms and skills, combined with the context of the use of the vocabulary and the analysis and explanation of functional angles in its structure, according to the “work” of the node, as shown in Table 1 and Figures 4 and 5.

As can be seen from the above table, the verbs “Work” matching from the node are often seen through observation, and the expansion context can be seen, often presenting positive tendencies; “Work” replies can be seen by analyzing and expanding context. Most of the presence is a positive tendency, a small amount containing negative significance, and the context will find the semantic meaning of halo.

5.2. Japanese Structure Classification Based on Bilingual Parallelism

It can be seen from Table 1 that the number of uses is more than 20 times of “増 加” and “improvement.” Analysis of the statistical results of the use of helins, you can find that in the Japanese structure, and the structure corresponding to the verb is much more. Among them, the verb “increase” is the highest in the overall proportion. The morphology, matching, and semantics of the modality are complex and diverse. Therefore, it is difficult to achieve comprehensive and accurate analysis through the exemplary method in traditional translation. With the aid of the instrument, we can perform more efficient, fast retrieval and analysis of a large number of corners. Therefore, we will use the parallel language libraries to make statistical analysis comparisons for statistical analysis of the source language and translation and the original text, thus exploring the translation characteristics and regularities. Retrieving the text, translation, and original text in the bilingual parallel library. Analysis by retrieving several modal words with the highest frequency. As shown in. Figure 6.

From the specific use of high-rate verbs, the original vocabulary of the Chinese translation day is 358 times, and the high-frequency modal words listed in the table are 1274 times. So the highest frequent statement “should” account for 33% of the high-frequency voices used. Chinese translation, Chinese translation, the original high-end verb, 332 Japanese original text, the highest frequency verb, a total of 247 times.

According to the data above, the original text of the Chinese translation day and the translation of the translation of the translation is high, and the translation is higher, and the authors have demonstrated an emotional desire to comprehend the original text. It can be seen that there are more useful uses in translation, and the possible moderate use frequency in original text is higher, when compared to Chinese translations and Japanese original text. The influence of the original text’s content is responsible for this outcome. The translator can express the emotional attitudes of the translation readers by reflecting the original understanding’s emotional attitudes. It can also be seen in Japanese original text that, with unusual use of the inevitable emperor, demonstrating the use of Japanese social science text language is relatively soft, with a less emotional attitude.

5.3. Analysis of High-Frequency Word Based on Han Da Paper

With high-frequency word statistics, significant language features, as well as ideology issues hidden behind simple words, vocabulary is the top ten real words in the original text of the Chinese translation and the translation of the Chinese translation, the top 10 The words are experts, work, questions, consultants, governments, technology, aspects, proposes, and help. The frequency of appearance is shown in Figure 7.

According to the high-frequency word of the Chinese translation of the original text and the high-frequency word statistics of the Chinese translation, the high-frequency word is basically the same, the most frequent frequency in the original text is “experts,” and the most frequent frequency in the translation is also the meaning of “experts:” Second, the original high-frequency words “work,” “aspect,” and “help” have occurred in the translation high-frequency word, and the frequency is high, indicating that the content is high. But its frequency distribution data is different. The cause of the analysis may be that there are many words of transformation in Japanese, Japanese expression is used to express multiple synonyms, and sentences are more rich, not monotonous. With analysis of the high-frequency words of the translation and original text, the expression of the text content is high, indicating that the translation and original text content are similar to the value. Second, we found that the frequency is relatively high; it can be seen that the translation is more in accordance with the characteristics of the vocabulary use of Japanese original text.

5.4. Quality Analysis of News Translation from Intercultural Communication Perspective

When a news translator first translates an article, the news language positioning work must be completed first, ensuring the original text’s language characteristics and cultural structure. After positioning, the translator uses his or her own knowledge to conduct news translations to the article’s understanding in order to achieve the ultimate goals of news English translation. News can be divided into different texts such as information, expressions, and summoning models under normal circumstances. Translators must use different translation skills when dealing with various text types in order to accurately convey the meaning of news. For example, in an information news text, the translator can translate using objective language, eliminating the need to add personal insights and emotions, and the translational sentence is relatively simple and accurate. Expression text focuses more on sentence translation, attempting to reflect the “personality” of the news, which must be consistent with the original language characteristics when expressing, allowing for more speakers, making it easier to read, and assisting them in comprehending the news’ content.

Wordsmann is the basic subject of natural language processing. It is the basic technology for more advanced processing, playing an important role in many fields of machine translation, speech recognition, text school pair, information search, etc., in various natural languages widely used in processing systems. The word label is the process of adding explanatory linguistic information on corpus; that is, each word in the sentence is a suitable word, and its difficulty is to identify and have a word, that is, how to give a context environment for multiple words. In the experiment, a large-scale corpus is required, but due to conditional limitations, there is no standard mark text; we use news character and pretreatment, etc., for the experiment training text. In order to compare the parallelism, this paper is based on a node of the bilingual textbook and analyses, 20 times in the fixed language library iteration. The experimental results are shown in Figure 8.

Different size training samples in Figure 8 affect the training time: The time of acquiring feature functions increases with the size of the sample growth, and the iterative calculation time increases with the size of the sample, but the growth rate is getting smaller and smaller: The main time spending is the iterative calculation. With the growth of training samples, feature functions need to be acquired from all samples, and the time spending has been linearly increased with the growth of samples: As the training sample increases, more features have existed in front of the previous text, thereby increasing. The number of feature is low, and the iterative process is the parameter of the characteristic function. Therefore, the growth rate of time spending is getting smaller and smaller, and it will tend to be a smooth time when the sample is sufficient.

Experiments use GB scale training samples, but it can be seen that as text is increasing, the increase in time spending can be expected, and finally until iteration calculates no longer increase, the increase time is acquisition feature function in part of time. When the iterative calculation time is no longer growing, the increase in the sample has no new feature function, and the increase in training samples has no meaning. Calculate the number of different nodes and different sequences; the experimental results are shown in Figure 9.

With the increase of the semantic number, the training time is shortened in Figures 9 and 10. The result is a larger CRF model. Multiple semantics can improve efficiency: When the semantic number is 32, the fair scheduling technology of translation will ensure that each node runs a semantics, rather than adopting the principle of “in recent computing,” so that some communication costs will be greater than the calculation price, resulting in performance. The execution time has risen slightly.

6. Conclusions

The study provided basic corpus collection work in Chinese and Japanese machine translation, which automatically acquires bilingual entities instead of artificial collection, which greatly improves work efficiency and increases the number of corners. The original vocabulary of the Chinese translation day is 358 times, and the high-frequency modal words listed in the table are 1274 times, so the highest frequency term “To” accounts for 33 percent of the high-frequency voices used, according to this paper. 332 Japanese original text, the highest frequency verb, a total of 247 times. Because there will be more cultural differences during the Japanese translation, you should use your own language habits to improve the smoothness of news translations, so that readers can understand the information contained in the article the first time. Translation is primarily used to convert the original Japanese sentence structure to Chinese and to ensure that the original language and text remain consistent. This type of translation allows readers to gain a better understanding of other countries' national characteristics and language habits, resulting in fewer gaps between translations and original expression. In journalism, long sentences are used more frequently. If the translator is unable to accurately analyze the corresponding structure, the quality of the news translation will suffer, and the original news will not be communicated. In order to improve the accuracy of the translations, the translator must first clear the sentence before transforming the overall locale in the article.

Next research direction: Improve the auxiliary corpus and quantum tools, support a greater degree of automation, improve the corpus operation rules, limit the particle size of corpus and tangle, to better conduct language research, and provide high-level translation software, with quality corpus, improving the insufficient implementation of related algorithm in machine auxiliary translation, in order to achieve the theoretical efficiency of the algorithm.

Data Availability

The data used to support the findings of this study are available from the author upon request.

Conflicts of Interest

The author does not have any possible conflicts of interest.