Abstract

DDL (data-driven learning) advocates a deep integration of modern information technology and resources based on corpora and search engines with education and teaching, opening up a new path for college English translation instruction. The concept and characteristics of DDL theory are discussed in this paper. This system proposes an automatic scoring model of English translation based on mixed vector space, which comprehensively analyzes the target translation from two aspects: grammatical errors and semantic relevance, in order to address the shortcomings of current algorithms. The findings of this study show that using a combination of English fragmentation translation and adaptive resource push, learners can learn English reading using mobile terminals in fragmented learning situations, as well as receive targeted training for weak links in English reading in fragmented learning time and space, improving their learning efficiency and effect.

1. Introduction

Many colleges and universities, particularly newly established undergraduate colleges, place far too much emphasis on CET-4 and CET-6 pass rates and continue to teach in an exam-oriented manner. More importantly, from a market and social demand standpoint, the deepening of reform and opening up, as well as the continued acceleration of economic globalization, necessitate a large number of high-quality compound professionals, particularly science and engineering talents who are fluent in English and engaged in related technologies and businesses. This enormous social demand, however, clearly cannot be met solely by translators with an English major [1, 2]. College English teachers and scholars should debate how to apply the ever-changing research results of computer technology and corpus linguistics to the reform of traditional translation teaching modes in order to meet the new requirements of college English teaching and personnel training.

The closed, passive, isolated, and spoon-fed traditional teaching mode of college English translation obviously cannot meet the needs of today’s society for college English teaching [35]. DDL (data-driven learning) advocates the deep integration of modern information technology and resources based on corpus and search engine with education and teaching, which opens up a new path for college English translation teaching [6]. The research on DDL-based translation teaching started late in China, focusing on speculative papers, mostly demonstrating the feasibility and advantages of DDL from a macro perspective [7, 8], and relatively few empirical studies [9, 10], all of which focus on translation teaching for English majors. Through the observation, discussion and analysis of a large number of objective and true corpora, the characteristics of language use are summarized and the learning problems are actively solved. Therefore, social constructivist learning theory is consistent with DDL concept, which can provide theoretical guidance for the construction of university translation teaching mode based on DDL.

The innovation of this paper is as follows: design and implement a systematic automatic correction translation model. This model analyzes the grammar of the translation to be tested by the grammar error detection algorithm combining rules and statistics and constructs a mixed semantic space. It combines the structural knowledge and the distributed semantic knowledge of word vectors, analyzes the translation to be tested and the standard translation based on the semantic relevance algorithm of the mixed semantic space, and then gives corresponding weights, so as to effectively and automatically score the translation to be tested.

Data preprocessing, as an important and indispensable step, has also developed rapidly. Literature [11] puts forward MI (Multiple Imputation) algorithm, whose basic theoretical framework was compiled in a collection of papers in 1987. This algorithm fills missing values according to different principles, so that each missing value produces at least two filled values. Literature [12, 13] experimented on the data set of software engineering efficiency prediction, using KNN (-Nearest Neighbor algorithm) filling algorithm and class mean filling algorithm to fill the missing values caused by different missing mechanisms, respectively, and found that random missing is the safest default missing mechanism by comparison. Literatures [14, 15] pointed out that the view that the specific analysis is meaningless if the missing rate exceeds 40% has been widely accepted. Literature [16] research shows that the research scheme can be designed through the lack of planning. In reference [17], Naive Bayes classifier models are established for four commonly used missing value processing methods, and the experimental results show that none of the missing value processing methods can be applied to any problems. A kernel-weighted regression estimation-based missing value filling algorithm was proposed in [18]. This method successfully filled the missing values of gene chip data by using kernel-weighted information of similarity to realize regression estimation of missing values. The method of filling missing values in random signals is investigated in [19] using the mechanism that a stationary random sequence is constrained by sequence autocorrelation. A method of filling missing values based on the one-step difference variance of stationary random sequence and the change rate of autocorrelation function is derived using the relationship between one-step difference variance of stationary random sequence and the change rate of autocorrelation function, which has good filling accuracy and convergence.

Adaptive learning can provide support in teaching feedback, learning situation tracking, and learning diagnosis according to the characteristics of learners’ cognitive styles and habits and then recommend personalized learning resources and learning paths for learners to maximize learning benefits. Literature [20] lists adaptive learning technology as the key technology in the short term and points out that it can provide strong support in improving learning experience and realizing personalized support. According to learners’ learning background, interest preference, and knowledge level, an adaptive learning system is developed to meet learners’ personalized learning needs. Literature [21] holds that “personalized adaptive learning” supported by big data will become the next research paradigm of educational technology. Literature [22] puts forward a personalized adaptive learning structure based on big data, which includes eight parts, such as AR (adaptive recommendation) engine, to improve students’ learning initiative, enhance self-efficacy, and optimize learning process. Literature [23] holds that the construction of adaptive learning system should consider learners’ learning style to improve adaptive ability and make learning description, diagnosis, and recommendation more accurate. According to the above research results, it can be found that the current research mainly focuses on the development and application of adaptive learning systems, tools, and technologies, while the core of adaptive learning systems, namely, adaptive learning analysis model, is seldom discussed.

3. Research Method

3.1. Construction of College English Translation Teaching Mode Based on DDL

DDL is a new language learning method based on computer and corpus technology that emphasizes the importance of language learners’ learning being driven by language material acquisition [24]. It asserts that learners in data-driven mode can fully exploit the massive real corpus provided by corpus, observe and analyze the retrieved corpus, and summarize language phenomena such as semantic expression, pragmatic features, and grammatical rules in order to achieve “discovery learning.” DDL is a teaching mode that uses corpus indexing to take authentic corpus as language input. Students’ ability to explore college English translation teaching can be improved through the top-down process of DDL method, and students’ ability to discover the English translation they have learned can be exercised, which can provide a good learning method and lay a foundation for improving the quality of college English translation teaching, as well as promote rapid communication and development of social globalization and integration.

Data-driven language learning mode is different from the traditional teaching mode: learners retrieve language materials from the corpus through corpus retrieval tools, then classify the materials on the basis, and finally find out the rules of syntax and semantic usage of a specific language structure after studying the classified language learning materials.

According to the corpus data-driven language learning process, this study combines multimedia technology with corpus, and with the support of constructivism learning theory, DDL theory, and autonomous learning theory, constructs a DDL-based college English translation teaching model, as shown in Figure 1.

The teaching mode of college English translation based on DDL has the following characteristics: (1)DDL takes students’ autonomous learning as the center, teachers play the role of guidance, help, and demonstration and pays attention to cultivating students’ learning interest and autonomous learning ability(2)DDL-based teaching is mainly based on corpus and search engine, which can provide detailed, rich, and authentic real corpus; build a real and effective learning environment for language learners; and improve their learning efficiency(3)Through the comparative analysis of English-Chinese parallel corpora and English corpora and Chinese corpora, learners can cultivate the learning habits of self-exploration and self-discovery, so as to make the knowledge they have learned more meaningful and systematic and form a long-term memory(4)DDL is a bottom-up inductive learning method, an inquiry and discovery learning process, which conforms to the law of second language acquisition and helps to absorb and internalize new knowledge

3.2. Design of Automatic Correction Translation Model

In English translation training, students often make grammatical errors such as article errors, adjective phrase errors, prepositional phrase errors, tense errors, auxiliary verb and modal verb errors, and subject-predicate consistency errors. Aiming at these errors in translation, a grammar error detection algorithm based on the combination of rules and statistics is adopted.

The main goal of the rule-based method is to summarize the expression forms of incorrect grammar in a large number of sentences and then store them in the rule base according to the defined rules. When checking for grammar mistakes, the analyzed sentence is compared to the error rules stored in the rule base. There is a corresponding grammar error in the sentence if it can be linked to an error rule. Currently, when analyzing translation, the automatic translation scoring algorithm does not take into account the impact of grammatical errors on translation scoring and thus cannot accurately determine semantic relevance between the target and standard translations. This system proposes an automatic scoring model of English translation based on mixed vector space, which comprehensively analyzes the target translation from two aspects: grammatical errors and semantic relevance, in order to address the shortcomings of current algorithms.

Statistical grammar error detection algorithm mainly adopts an -element grammar model, and its basic principle is based on Markov hypothesis.

Assuming that is a sentence with words, where is each word in the sentence and , is the probability of the word appearing in the corpus, and the probability pp of the sentence appearing in the corpus is calculated as follows:

That is, the occurrence probability of the th word in the corpus is only related to the first words. This study mainly adopts the binary model, so formula (1) can be simplified as

is the number of times that the word pair appears in the corpus. The process of using corpus to estimate probability distribution is called training process.

Let be the vector matrix to be speculated. Our aim is to not only shorten the distance between the vector and its initial vector but also shorten the distance between and its adjacent vector in undirected graph. This process is explained with the help of Figure 2.

Through the transformation, the distance between the right node and the left node is shortened, and the distance between the right nodes with connection is shortened. So we can get the vector matrixto be measured by the following minimization formula: in which is the coefficient that controls the degree of correlation between the vector to be tested and the initial vector , and is the coefficient that controls the degree of correlation between . By further transforming the mixed semantic space, the semantic similarity among synonyms can be accurately obtained.

The score of semantic relevance between the test translation and the standard translation is recorded as where represents the cosine similarity between the th sentence vector in the translation to be tested and the th sentence vector in the standard translation . The value range of is 0-1.

According to the Chinese college students’ examination translation scoring standard and a large number of translation scoring experiments, the grammar score is set at 25% of the total translation score, the semantic relevance score is set at 75% of the total translation score, and the final translation score of the translation to be tested is

Among them, the range of is 0-1. According to the proportion of English translation scores in CET-4 and CET-6, the automatic translation score below is multiplied by a coefficient of 15; that is, the range of automatic translation score in the experimental part is 0-15.

3.3. Design of AR Algorithm for English Fragmented Translation Resources

The ID3 (iterative dichotomizer 3) algorithm selects the attribute with the highest information gain as the test attribute of the current node. When the given conditions are not met, the process of establishing decision tree is iterated continuously.

The machine learning algorithm ID3 is used to determine the resource push strategy of each learner, generate the resource sequence for learners to learn, and then determine the target learning resources and push them to learners [25]. The system constantly adjusts the push strategy through learning feedback, so as to realize the AR of English reading resources. The AR model of learning resources in English fragmented translation constructed in this paper is shown in Figure 3.

Based on different users’ goals and needs, this model adaptively matches the data types and specific contents of the learning process on the basis of full analysis of learners’ characteristics and learning state and then automatically selects learning analysis methods and tools. Finally, the analysis results are visually fed back to learners and their stakeholders. The model can adaptively adjust and recommend learning content according to learners’ learning state and preferences, learners can conduct self-directed learning according to the analysis results, and stakeholders can also make teaching decisions, teaching interventions, and personalized recommendations according to the analysis results.

This model uses a variety of learner data as its foundation, allowing for the efficient development of learning analysis. The adaptive engine is crucial in assisting with the extraction and use of learner data, the application of learning analysis methods, the selection of learning analysis tools, and visual presentation feedback. The data structure model of the original ID3 algorithm is improved based on the learner model’s above characteristics. Before the user logs in the next time, automatically extract the user’s last saved decision tree and sequentially judge whether the root node and each node of the branch have changed; if not, keep the last decision tree; if the part changes, the unchanged part keeps the previous nodes, and the decision tree is reconstructed from the changed part; if the part changes, the unchanged part keeps the previous nodes, and the decision tree is reconstructed from the changed part. Rebuild the entire decision tree if it changes completely.

When the learner’s self-rating difficulty is moderate, the recommended resources are appropriate. That is, , the category attribute is , which can be obtained as follows:

When pushing adaptive resources, there are eight attributes to consider; we need to consider eight attributes, namely, question type, subject matter, difficulty, translation ability, cognitive style, learning goal, learning situation, and learning effect, which are expressed by, and, respectively. Each attribute has a different value range.

Take attribute as an example: there are samples, among whichsamples with attribute tn hasandsamples with attribute tn has.

Furthermore, in , there are bars with “recommended good or bad” category of , with “recommended good or bad” category of , bars with “recommended good or bad” category of , and bars with ; then, the information entropy of attribute is

Similarly, the conditional entropy of the , and attributes under the recommended good or bad category can be calculated. We can calculate the conditional entropy of , and attributes under the recommended good or bad category.

Calculate the information gain of attribute as follows:

From this, the information gain of can be calculated. Calculate the , and attributes in the same way. Select the node with the largest information gain value, create a leaf node for each value of this attribute, classify constantly and recursively in this way, and finally construct a decision tree for pushing fragmented reading resources.

4. Results Analysis and Discussion

4.1. Experimental Analysis of Automatic Correction Translation Model by System

The performance of the system is measured by reading and writing throughput, response times per second (QPS value), memory consumption rate, and other indicators.

Learners and their related learning activities generate learning analysis data, and the ultimate goal of learning analysis is to support learners’ effective and personalized learning. Learners can create digital portraits in three dimensions: point, line, and plane: point, that is, ontology data collection in real time. Learners’ images are recorded instantly from various angles, similar to people photography, so that they can be spliced into three-dimensional images of people, and data can be used to form learners’ instantaneous and multidimensional cognition at a given point in time.

The data generated by learners during the learning process is known as behavioral data. This type of data is dynamic and continuous, and it is typically recorded and stored by computers automatically and in real time. It can be divided into modules based on demand, such as learning satisfaction, learning engagement, academic achievement, and learning social network. Each module has its own set of data that can be gathered and quantified to provide support. Learners’ data can only provide support for understanding learners’ behavior and learning status, and they can only be supported by learning data, once it has been mastered to a certain degree of fineness.

Figure 4 is an experimental comparison diagram of throughput between this system and traditional Socket system.

The above rules serve as a useful guide for data processing, as well as a reminder that laws and ethical norms should be applied throughout the learning and analysis process and that every step of data processing should be constrained by them. Before collecting and processing data, learners should fully comprehend the method, content, and scope of data application; display the authorization letter for data processing; and obtain learners’ approval, in order to improve the transparency of data use and processing. It can be seen that the teacher is no longer just a single knowledge transmitter but also serves as a data-driven teaching designer, a corpus resource counselor, an instructor, promoter, an evaluator of data-driven learning, and a collaborative learning partner. Figure 5 depicts the experimental results.

The QPS value of this system is 130, while that of the Socket system is 40, which is more than three times that of the control experimental group. The QPS performance test of the system is far better than that of the Socket system.

In this experiment, the concurrency was taken as the independent variable and the memory consumption as the dependent variable, and the two experimental groups were compared. The test results are shown in Figure 6.

At present, restricted by various conditions, most middle schools cannot offer English writing courses. Therefore, teachers should seek countermeasures from various aspects in normal teaching, purposefully adopt various means, and strengthen the training of students’ writing ability, for example, rewriting after reading the text, combining speaking with writing, discussing before writing, and writing after listening.

In addition, teachers should encourage students to use complex sentence patterns properly, to fully exploit their advantages, to stimulate their potential, and to fully display their language style as part of their regular training. That is, in addition to focusing on the accuracy of language, we must also consider its richness and diversity and express rich and colorful content in a variety of ways. Draw nutrition from the text or expression model, for example, and learn and save a few wonderful sentences. Students should be asked to write paragraphs with clear topic sentences, strong supporting sentences, and incisive conclusion sentences, according to their teachers. In the process of writing articles, we should remember the two basic principles of unity and coherence. The use of connecting elements in the right places can make the entire article flow smoothly.

The composition score of this system is compared with the manual score in 500 compositions of CET-4 and CET-6, and the distribution of teacher scores and automatic correction scores is displayed in the form of scatter chart (Figure 7).

As can be seen from Figure 7, the correction results of this model are generally close to those of manual correction. The average score of manual composition is 74.21, the average score of this system is 73.77, the error of the score is 4.93, and the Pearson correlation coefficient is 0.84.

The system scores are compared with the manual scores in 500 CET-4 and CET-6 translations, and the distribution of teacher scores and automatic correction scores is displayed in the form of scatter charts (Figure 8).

As can be seen from Figure 8, in terms of the overall score of English translation, the correction results of this system are generally close to those of manual correction. The average score of manual translation is 9.82, the average score of this system is 10.92, the average error of scoring results is 1.39, and the Pearson correlation coefficient is 0.88.

4.2. Learning Process Analysis

The learner model shows that in order to realize the adaptive push of resources, the system needs to understand what level, what goal, where and how to learn, and the effect of learning based on resource push. Among them, because of the particularity of frequently changing places in mobile learning, the information of learning situation has more important influence on fragmented learning effect than other environments.

Select the learners who can continue to use the mobile application to study in the learning record, and analyze the correct rate of typical learners’ daily exercises during the one-month study, as shown in Figure 9.

Figure 9 shows that learners can continue to use the mobile application to aid in learning English translation, and the overall learning effect is increasing, indicating that using the mobile application to aid in learning English CET-4 careful reading module can improve learners’ learning ability, and the learning effect is increasing over time. The term “learner corpus” refers to a collection of texts written in English by nonnative English speakers. The difference in writing proficiency between English learners as a second language and native English speakers is not limited to vocabulary. The vocabulary gap between English learners and native English speakers is not the only thing that shows up in their writing. The range of values of each feature of three typical learners is unified from the perspective of learners’ categories, and the radar map of feature distribution of typical learners is obtained after transformation, as shown in Figure 10.

It can be concluded from the intuitive display diagram of Figure 10that 3 types of learners prefer to study in a quiet environment. However, in the process of learning, the learning effects of both learners are constantly improving.

From the continuously rising learning effect, it is shown that the selection of learners’ features and resource features in the push model is reasonable, and it can sensitively reflect the changes of learners’ models. In the next stage of system improvement, we can respond quickly to the pushed learning resources accordingly.

5. Conclusion

College English translation instruction based on DDL is a cutting-edge teaching concept. Students can gain full access to authentic language knowledge and cultural information, create a real and effective language environment for students, and improve their language skills by using a corpus and search engine. The English writing and translation training system proposed in this paper replaces manual correction and evaluation, provides a more time- and labor-efficient way for teachers to correct English assignments, alleviates China’s current teacher shortage, and serves as a supplement to English instruction. It can not only relieve teachers’ workloads, but it can also boost students’ intrinsic motivation for writing and translation, cultivate their subjective awareness of learning, and improve their writing and translation skills. This paper examines the resource recommendation and learning effect, the model’s validity, the category of resources and learning, and the learning process by collecting learning record data and using correlation analysis, multivariate analysis of variance, regression analysis, and cluster analysis. The learning effect of different learners using this mobile application has improved to varying degrees, according to the analysis of typical learners.

This system’s automatic correcting translation model can still be improved. Although the translation model’s grammatical error correction module has been able to detect the majority of grammatical errors, a small number of grammatical errors remain difficult to correct, so future research will focus on this area.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The author declares that there are no conflicts of interest.