Abstract

English writing is considered by English learners to be the portion of English learning with the greatest application, the most thorough understanding, and the most challenging instruction. It automatically detects and corrects (DAC) grammatical faults in English writing, which is critical in the English learning and teaching processes. The goal of this research is to investigate the sequence annotation model and the Seq2Seq NN model based on cyclic NN, and to use these two models to detect grammatical faults in English (EGE). This paper provides an EGE DAC approach based on sequence annotation with the aid of the sequence annotation model developed in this paper. Simultaneously, this work presents an EGE DAC approach based on Seq2Seq that integrates the sequence annotation model. The model is no longer trained on a single form of grammatical error, but rather on all types of errors combined, allowing it to respond to any EGE. This work considers the DAC of grammatical errors with fixed confusion sets, such as prepositions and articles. This model’s F1 value for article error correction is 38.05 percent, which is 33.40 percent higher than the F1 value for UIUC article error correction. The F1 value for preposition error correction is 28.89 percent, which is 7.22 percent higher than the F1 value for UIUC preposition error correction.

1. Introduction

Educational artificial intelligence is a new study topic that combines education and artificial intelligence. It focuses on teaching and educational management artificial intelligence. Natural language technology is quickly evolving in tandem with the ongoing growth of the big data era. Because natural language technology is frequently employed in dialogue systems, public opinion analysis, and text classification, the benefit of incorporating artificial intelligence technology into an automatic English writing test becomes even more apparent during an outbreak.

Pattern recognition [1], natural language processing, machine learning [2], and other domains are all part of the autonomous test technology for English writing. Deep learning has a wide range of applications, not just in machine vision, because to its superior learning and performance capabilities. In the field of natural language processing, it is also commonly utilized. The research of text similarity is currently the primary problem of computerized scoring algorithms for English writing. It is utilized in document classification, intelligent question and answer systems, and related article recommendation systems, among other applications. It uses deep neural network (NN) technology to increase the accuracy of the composition automatic scoring system. To improve the system’s interactivity, this study blends writing feedback theory with data visualization guidelines to develop a new feedback report. Artificial intelligence technology will bring changes to the feedback system of English writing and realize the personalized teaching of English writing in a real sense.

The innovations of this paper are as follows: (1) this paper proposes a NN model to effectively solve sequence labeling. Different from previous annotation models, this model integrates character, word and sequence information, introduces coarse-grained learning, and divides the annotation process into two stages, so as to make the annotation process more robust. (2) This paper proposes a method of EGE DAC based on sequence annotation model. This method uses the sequence annotation model proposed in this paper to label syntax errors, so as to avoid the problem of manually extracting a large number of features in the traditional methods. (3) A method of English grammar error detection and correction using seq2seq neural network model is proposed. This method directly maps the original sequence to the target sequence without distinguishing the error types.

The English writing training system will greatly contribute to the reform and innovation of traditional teaching structures. Liu G proposed an improved group particle walking path optimisation algorithm to address the shortcomings of the traditional model and the characteristics of intelligent English writing. The experimental study shows that the constructed model is somewhat intelligent. The Lee H aimed to investigate the impact of a Korean university English writing course focusing on self- and peer-review on learner autonomy in order to find ways to improve the quality of English writing courses. Aliyev and Ismayilova aimed to find out the effectiveness of incorporating films into online-supported English writing instruction. The study played a crucial role in the lives of all students [3]. Scientific writing is difficult, and Makarenkov et al. had proposed a new machine learning-based application for correct word choice tasks, while state-of-the-art grammar error correction uses error-specific classifiers and machine translation methods [4]. Aljunaeidia et al. aimed to implement a set of preprocessing functions for handwritten Arabic characters by contour analysis and inputting contour vectors into a neural network for recognition, the proposed algorithm structure has a recognition rate of about 97% [5]. Tarawneh aimed to investigate the effect of using smart board on improving English writing skills of 9th grade female students in South Al-Mazar Education Board in 2017-2018. He designed a learning tool (pre-test) to test the validity and reliability of students’ writing skills [6]. Due to Speech Emotion Recognition (SER) that has its challenging nature and promising future, Gunawan et al. aimed to recognise human speech emotions using Deep Neural Networks (DNN) to extract selected speech feature Mel Frequency Cepstrum Coefficients (MFCC) from raw audio data [7]. Although his research has made great contributions to human speech emotion analysis, its application scope is still extremely limited.

3. Intelligent Test Method of English Writing Based on Neural Network

3.1. Traditional English Writing Test Methods

The benefits of incorporating artificial intelligence technology [8, 9] into teaching are particularly obvious during the epidemic time. Teachers may teach and grade homework from anywhere in the world. Students upload their coursework as photographs. Teachers use photos to correct their pupils’ assignments. Despite the fact that this working approach incorporates modern technologies, the workload is still very high. Simultaneously, assessment is one of the strategies for evaluating pupils’ learning effects. The amount of work that teachers have to do in terms of marking is enormous. Objective and written questions are the most common forms of English exam questions.

At present, in the marking technology of objective questions, the computer can judge the scores by matching the students’ answers with the standard answers, and the judgment of English writing is basically based on the method of manually correcting the examination papers. The whole process may be repetitive, time-consuming, and not necessarily reliable [10]. Many external factors will affect the manual scoring results of the subjective questions. The different subjective consciousness of raters will make the scoring results different, which makes the scoring results not fair and accurate. Therefore, it is of great value and significance to study the automatic scoring technology of English writing [11]. At present, the automatic scoring technology of English writing is not very mature, but with the development of natural language processing, integrating deep learning into the automatic scoring system will improve the accuracy of the automatic scoring technology. When the deep learning algorithm [12] is applied to the field of automatic scoring, a computer can automatically score the English writing of a large number of test papers faster and more fairly. This greatly reduces the workload of raters and makes the scoring results of English writing more fair and objective.

3.2. Basis of Neural Network

NN is a computational model inspired by biology. For biological NN, different neurons are interconnected, and its scope of action is shown in Figure 1.

As can be seen from Figure 1, the back propagation algorithm of multilayer NN, namely, BP NN model, is improved [13]. The back propagation algorithm can improve the efficiency of adjusting parameters such as neuron weight and has strong learning ability. It is also one of the more popular NN algorithms at present. Its model is shown in Figure 2.

As shown in Figure 2, convolutional NN training mainly includes two stages: forward and back propagation [14]. The first stage is that the data starts from the input layer, passes through each subsequent layer in turn, and finally reaches the output layer, which is the forward propagation stage. The more general the information level is, the output value of the node can be obtained by weighted summation with the output of the previous node. At this stage, the convolution kernel should be initialized before model training [15]. The other is the process of propagating the difference between the current propagation result and the target value forward from the output layer. The back propagation stage is to calculate the error between the value of the result stage and the target value, and adjust the weight matrix by minimizing the error [16].

The mean square loss function is usually used in the least square method. The mean square loss function is shown in formula:

Among them, y is the real value and is the predicted value. When using the mean square loss function, because its partial derivative value is very small when the output probability value is close to 0 and 1, it makes the partial derivative value almost disappear when the model starts training. The initial training rate is very slow. The formula of cross line loss function can be expressed as:a indicates the probability of positive prediction, and N represents the total sample number [17]. Adding an additional function after the loss function is called regularization. It mainly limits the parameters in the model to reduce the possibility of overfitting, which can reduce the complexity of the model and improve the generalization ability of the model. The mathematical expression is shown in formula:

Among them, x and y are the training set samples and corresponding labels, is the weight coefficient vector; is the objective function, and is the penalty term. Common penalty terms include L1 regularization and L2 regularization [18].

3.3. NN Word Vector Representation

The characteristic of cbow (continuous bag of word model) model is that the input is the word vector before and after a central word, and the probability that the output is the central word is predicted. It works well in a small corpus. CBOW model needs to calculate the probability of generating central words from context words [19]. Given the context background word  = , ..., the vector of the word i as the central word is , and the vector of the background word is , calculate the conditional probability :

Among them, represents context information.

It is assumed that there are s words in the text to be trained, the s word is , and the window size is m. Then, the likelihood function of cbow is expressed as , which represents the probability of generating any central word [20].

The loss function used is as shown in formula

Calculating the gradient of any background word vector , (I = 1, 2, 3, …, 2m) through differentiation, as shown in formula:

Assuming the word i as the center word vector and as the background word, the probability of generating the background word according to the center word .

The goal is to maximize the likelihood function. In the training process, the loss function used is as shown in formula;

The gradient descent method is mainly used for parameter updating. The gradient of any central word is calculated as formula:

3.4. RNN Training Process

RNN is divided into two training processes: forward propagation and back propagation, and iterates with timing as the core. The training process is as follows:

3.4.1. Forward Propagation

Assuming that the input vector of the hidden layer is r, the weight between the input layer and the hidden layer is , the input vector is x, and the output vector of the hidden layer at the previous time is q, the input vector of the hidden layer at time t is formula.

The output vector of the hidden layer is formula (11) at time t.

Q represents the output vector and represents the hidden layer activation function.

3.4.2. Back Propagation

BPTT (back propagation through time) is a commonly used algorithm for training RNN. In essence, it is developed based on the BP algorithm [21, 22]. The training process is as follows:

Assuming that the error function is E, the error e of node j is obtained by chain derivation, and the error of hidden layer at time t is formula.

Then, the weight is derived. Here, gradient descent method is adopted. The derivation formula is as shown in formula

Then, according to the learning rate a, calculate the weight adjustment formula as follows in the equation

RNN solves the problem that BP NN cannot remember time series, but the network also has problems such as memory degradation, gradient explosion or disappearance, which affect the prediction accuracy [23].

3.5. LSTM Model Structure

LSTM is also called short-term and long-term memory network. It belongs to chain structure like RNN. LSTM can capture long-distance word information, while RNN is difficult to integrate long-distance information into the current information [24]. Therefore, when processing long sequence information, the effect of using LSTM is much better than RNN. The unit structure of LSTM model is shown in Figure 3.

As shown in Figure 3, each LSTM layer contains three parts: forgetting gate, input gate, and output gate. The goal of LSTM is to control the transmission of information through these three control gates, so as to solve the possible gradient disappearance phenomenon in NN. Using these gating mechanisms, LSTM can also be used to build encoders and decoders. It has achieved good results in the field of machine translation. LSTM selectively discards the information of each unit through three gates. The working states of the three gates are as follows:

The first step is to use the forgetting gate to control how much information from the upper layer can be transmitted to the next step, and selectively send the information from the upper layer to the next layer. It is realized by a sigmoid layer. The forgetting gate value is shown in formula:

The second step is realized by two NN layers. It includes sigmoid layer and tanh layer. It is the first to determine which information is updated, and the calculation is as shown in formula (16). The second one is used to create a new candidate data. It combines these two values for updating.

The final output gate determines the output value. This process needs to process the discarded information and determine the output information through the sigmoid layer. It converts the updated information to a value between −1 and 1 through tanh, and calculates the output threshold value and the output value .

4. Experiment on Grammatical Errors and Correction in English Writing

This work designs an EGE DAC experiment based on the EGE DAC method based on sequence annotation. This article makes use of open data sources. The experiment is done out using the data division method proposed in prior research. The news corpus WSJ and the English composition corpus created by Chinese students are used to tag the parts of speech. There are 45 separate part of speech tagging tags in the WSJ corpus, with 25 parts. The experiment employs the same division method as earlier studies. The training set is made up of parts 0–18, the verification set is made up of parts 19–21, and the test set is made up of parts 22–24. The specific data distribution of the above corpus is shown in Table 1.

As shown in Figure 1, the English composition corpus written by Chinese students is selected from the English compositions written by Chinese students collected by the correction network, involving 44 different part of speech tagging tags (contained in 45 Tags of WSJ corpus). The corpus includes 13762 English sentences, of which 10000 are selected as the training set, 2416 as the verification set, and 1346 as the test set. Named entity recognition uses the shared task named entity recognition corpus of conll2003. It adopts the same setting as previous work, with the first 14987 sentences as the training set, 3466 as the verification set, and 3684 as the test set.

4.1. DAC of EGE in Sequence Tagging

The sequence annotation model based on recurrent NN is designed to DAC EGE. Because of English grammatical errors, most of them are the use of part of speech, tense or articles and prepositions in an English sentence, this paper makes statistics on the evaluation data of conll EGE DAC, and the statistical proportion is shown in Figure 4.

As shown in Figure 4, there are many error types marked in the data, but the evaluation task is mainly aimed at five error types: article error, preposition error, noun error, subject predicate consistency, and verb form error, from the statistical results, from the distribution of five common types of EGE, the use errors of articles and prepositions account for a high proportion. Moreover, the confusion set of preposition and article errors is relatively fixed.

4.2. English Affirmative Error DAC in Seq2Seq

This paper uses the sequence annotation model based on cyclic NN to DAC EGE, which cannot solve the problems of word missing and so on. On the other hand, when solving the types of EGE with uncertain confusion sets, it depends on corpora such as word form changes. Therefore, the sequence labeling model is applied to the Seq2Se model, and the DAC of EGE is solved by mapping the original sequence to the target sequence.

4.2.1. System Architecture

This paper designs and implements EGE DAC into three modules, including text preprocessing module, English syntax error DAC module based on sequence tagging, and EGE DAC module based on Seq2Seq. The architecture diagram is shown in Figure 5.

As shown in Figure 5, the EGE DAC module based on sequence annotation includes preposition error DAC module and article error DAC module. Seq2Seq-based EGE DAC module includes Encode module and Decode module.

4.2.2. Encode Layer

The Encode part of Seq2Seq model needs encoding the information in the sequence text. In subsequent decoding, it can refer to more semantic information. According to the designed sequence annotation model, the encode part is designed as a network structure, as shown in Figure 6.

As shown in Figure 6, for the input sequence text, each position in the text is represented by a vector through the encode layer. In this vector representation, the context information of the word is integrated. In the specific encode layer, there is the crnn structure designed in sequence annotation, which extracts the character level vector of each word. In this paper, the obtained character level vector information is input into blstm, that is, the final semantic vector information is obtained by synthesizing the context information.

4.2.3. Decode Layer

In the Decode part of Seq2Seq, the encoding data is decoded to get the final mapping from the original sequence to the target sequence. The structure diagram of attention decode is shown in Figure 7.

As shown in Figure 7, after the semantic information is obtained in the encode part, it is decoded. When decoding, it introduces the attention mechanism to construct the decode layer. When decoding at time t, the decoding information at time t depends on the decoding information at time T-1, as well as the encoded semantic information. However, this part of semantic information allocates the weight of each semantic vector in the input text sequence by introducing the attention mechanism. When decoding, there is a certain bias when using the semantic vectors of all input text sequences at time t.

4.3. English Grammar Error DAC

For example, the Encode layer and the Decode layer are designed. When encoding, the Encode layer’s network structure is similar to the sequence annotation network described. In the process of decoding, the Attention mechanism is introduced to balance the semantic information and the weight of the current input when applying the whole semantic information. The Seq2Seq structure diagram is shown in Figure 8.

As shown in Figure 8, this paper sets the input data of the model as sentences with grammatical errors and the output as sentences without grammatical errors. At this time, there is no one-to-one correspondence between the lengths of the two sentence pairs. Moreover, when solving the DAC of EGE, we can no longer distinguish the types of errors. No matter whether words are misused or missing, they can be solved directly.

4.4. Parameter Setting

Before the training of the NN, the super parameters in the NN are initialized. The initialization of the word vector selects the GloVe6 vector table which is better in terms of word similarity evaluation and NER task. The vector table is trained by text containing 6 billion words from websites such as Wikipedia. The super parameter settings are shown in Table 2.

As shown in Table 2, when the NN is trained, the dropout layer is added to the input and output layers of the cyclic NN to control the network training and prevent the occurrence of overfitting. A comparative experiment is also conducted on whether to use dropout.

4.5. Evaluation Index

In this paper, different corpora are used to verify the effect of the model in the sequence annotation task based on cyclic NN, and F1 is used as the evaluation index in named entity recognition.

P and R represent accuracy and recall, respectively. In the EGE DAC based on sequence tagging, because it involves the DAC of specific prepositions and articles, the evaluation indicators of the 2013 conll EGE DAC task are used for experimental comparison.

5. Two Experimental Designs

5.1. Experimental Results of Sequence Labeling Based on Cyclic NN

This research conducts comparison tests for the POS and NER tasks using the provided sequence annotation model based on cyclic NN. The addition of a supervised coarse annotation layer (BLSTM + Residual + 2Cost) to the baseline network (BLSTM + Residual) necessitated two parameter updates in a single training session, whereas the coarse annotation introduced for network supervision enhanced annotation accuracy. The accuracy of the different network structures for lexical annotation of the WSJ corpus is shown in Table 3.

As shown in Table 3, the input to the second layer of BLSTM in the above network consists of two parts, whose data are not evenly distributed. The network structure (BLSTM + Residual + 2Cost + BN) can improve the accuracy of the annotation because the introduction of batch clustering standardises these two input components. During training, the unregistered word vector is always untrained and the annotation results for this word segment tend to be more random and independent of the network structure. Different network structures were used to identify named entities in the CoNLL2003 text corpus. The results are displayed in Figure 9.

As shown in Figure 9, the CRNN was introduced in the experiments to solve the 00V problem by vectoring words from the character level and learning relationships such as word composition. The BLSTM + CRNN + Residual + 2Cost + BN network was 97.60% in the word annotation experiments. Accuracy was achieved, with a Fu value of 91.38% in the named homologous expression recognition experiment.

5.2. Experimental Results of EGE DAC Based on Sequence Annotation

Based on the experimental results of EGE DAC method based on sequence annotation (LSTM GEC) on the evaluation data of EGE DAC of conll, this paper compares the corpus-based EGE DAC method (Corpus GEC) (31] with the best UIUC evaluated in 2013. The error correction results of articles and prepositions are shown in Figure 10.

The F1 value of the method in this paper for article error correction is 5% greater than the UIUC method and 5% higher than the corpus GEC method, as shown in Figure 10. This approach’s F1 value is 21% higher than the UIUC method and 13% higher than the corpus GEC method when it comes to repairing preposition errors. It demonstrates that the EGE DAC method based on sequence annotation described in this paper is effective in fixing article and prepositional grammatical problems. This is due to the fact that the word vector carries a lot of background information. We can better learn the dependence information that determines the use of articles or prepositions using the sequence annotation model described in this paper.

6. Conclusion

There are many grammatical problems in English writing, such as spelling mistakes, word usage, and tense issues. As a result, automatic detection and correction of grammatical faults in English writing is critical in the English learning and teaching of ESL students and teachers. Rule-based and statistics-based methods are used in this section of the method: Rule-based techniques rely on the manual extraction of a large number of rectification rules, which is time-consuming and might lead to inconsistencies between the two rules. The statistical method relies on a large-scale corpus for support, and the correcting effect is ineffective. We look at the sequence labeling model and the Seq2Seq NN model based on recurrent NN in this research, and we use these two models to DAC EGE. An EGE DAC approach based on sequence annotation is proposed using the sequence annotation model developed in this paper. Overcorrection issues, such as synonym replacement, will arise. It is also a challenge to standardize the sequence to sequence mapping and tackle this type of noise problem in decoding.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author does not have any possible conflicts of interest.