Code Analysis and Software Mining in Scientific and Engineering Applications
View this Special IssueResearch Article  Open Access
Md. Mostafizer Rahman, Yutaka Watanobe, Keita Nakamura, "A Neural Network Based Intelligent Support Model for Program Code Completion", Scientific Programming, vol. 2020, Article ID 7426461, 18 pages, 2020. https://doi.org/10.1155/2020/7426461
A Neural Network Based Intelligent Support Model for Program Code Completion
Abstract
In recent years, millions of source codes are generated in different languages on a daily basis all over the world. A deep neural networkbased intelligent support model for source code completion would be a great advantage in software engineering and programming education fields. Vast numbers of syntax, logical, and other critical errors that cannot be detected by normal compilers continue to exist in source codes, and the development of an intelligent evaluation methodology that does not rely on manual compilation has become essential. Even experienced programmers often find it necessary to analyze an entire program in order to find a single error and are thus being forced to waste valuable time debugging their source codes. With this point in mind, we proposed an intelligent model that is based on long shortterm memory (LSTM) and combined it with an attention mechanism for source code completion. Thus, the proposed model can detect source code errors with locations and then predict the correct words. In addition, the proposed model can classify the source codes as to whether they are erroneous or not. We trained our proposed model using the source code and then evaluated the performance. All of the data used in our experiments were extracted from Aizu Online Judge (AOJ) system. The experimental results obtained show that the accuracy in terms of error detection and prediction of our proposed model approximately is 62% and source code classification accuracy is approximately 96% which outperformed a standard LSTM and other stateoftheart models. Moreover, in comparison to stateoftheart models, our proposed model achieved an interesting level of success in terms of error detection, prediction, and classification when applied to long source code sequences. Overall, these experimental results indicate the usefulness of our proposed model in software engineering and programming education arena.
1. Introduction
Programming is one of mankind’s most creative and effective endeavors, and vast numbers of studies have been dedicated to improving the modeling and understanding of software code [1]. The outcomes of many such studies are now supporting a wide variety of core software engineering (SE) purposes, such as error detection, error prediction, error location identification, snippet suggestions, code patch generation, developer modeling, and source code classification [1]. Since learners and professionals around the world are constantly creating large numbers of new programs to improve our lives, it is a general truism that no program is ever released without undergoing a comprehensive postdevelopment debugging process. Almost every software product goes through different testing phases in the SE cycle. In fact, once errors are detected in the source code at the production or testing phases, the debugging process begins immediately to find and fix the errors. This means that learners and professionals are spending vast amounts of time attempting to find errors in source codes. To find a single error, it is often necessary to verify an entire program, which is a very lengthy process and timeconsuming. This adverse situation has resulted in the emergence of a new SE research window [2]. There are significant numbers of errors that are commonly made by students, programmers, and developers. These include missing semicolons, delimiters, irrelevant symbols, variables, missing braces, incomplete parentheses, operators, missing methods, classes, inappropriate classes, inappropriate methods, irrelevant parameters, logic errors, and other critical errors. Although such errors often indicate inexperience, insufficient concentration to detail, and other unsuitable behaviors, a Google study on programming showed that such errors can creep into the works of even the most experienced programmers [3]. Normally, programming is a very sensitive and errorprone task and a single mistake can eventually be harmful to software endusers. Furthermore, the source code is highly errorprone during development, so the intelligent support model for code completion has become an interesting research area. Among the solutions now being explored, the use of artificial intelligence (AI) offers fascinating potential for solving source coderelated complications. In the past few years, natural language processing (NLP) developers have produced some extraordinary outcomes in different domains such as language processing, machine translation, and speech recognition. The reasons for the wideranging success of NLP are found in its corpusbased methods, statistical applications, messenger suggestions, handwriting recognition, and increasing large corpora of text. For example, ngram models are among the stochastic language model forms that can be used for predicting the next item based on the corpus. Different ngram models such as bigram, trigram, skipgram [4], and GloVe [5] are all statistical language models that are very useful in language modeling tasks. This burgeoning usage has stimulated the availability of a large text corpus and is helping NLP techniques to become more effective on a daybyday basis. The NLP language model is not particularly effective when used in complex SE endeavors but still useful for the intuitive language model. As a result, numerous researchers have focused their efforts on source code completion tasks using neural networkbased language models. In [6], the authors proposed a local cache model that dealt with localness of software code but still encountered problems with smallcontext source code using an ngram model. That study determined that neural networkbased language models could provide robust substitutes for source code illustrations. Additionally, another study showed that the recurrent neural network (RNN) model, which is capable of retaining longer source code context than traditional ngram and other language models, has achieved mentionable success in language modeling [7]. However, the RNN model faces limitations when it comes to representing the context of longer source codes due to gradient vanishing or exploding [8], which makes it hard to be trained using long dependent source code sequences. As a result, it is only effective for a small corpus of source codes. To minimize gradient vanishing or exploding problems, the RNN model has been extended to create long shortterm memory (LSTM) networks [8]. An LSTM network is a special kind of RNN that can remember longer source code sequences due to its extraordinary internal structure.
In this paper, we are presenting an intelligent support model for source code completion that was designed using an LSTM in combination with an attention mechanism (then known as LSTMAM) which increases the performances than a standard LSTM model. The attention mechanism is a useful technique that takes into account the results of all past hidden states for prediction. The attention mechanism can improve the accuracy of neural networkbased intelligent models. We trained LSTM, RNN, CNN, and LSTMAM networks with different hidden units (neurons) such as 50, 100, 200, and 300 using a bunch of source codes taken from an AOJ system. Erroneous source codes were then inputted into all models to determine their relative capabilities in regard to predicting and detecting source code errors. The obtained results show that our LSTMAM network extends the capabilities of the standard LSTM network in terms of detecting and predicting such errors correctly. Even some source codes contain logical errors and other critical errors that cannot be detected by the usual compilers whereas our proposed intelligent support model can detect these errors. Additionally, the LSTMAM network can retain a longer sequence of source code inputs and thus generate more accurate output than the standard LSTM and other stateoftheart networks. In addition, we diversified with different settings and hidden units to create the most suitable model for our research in terms of crossentropy, training time, accuracy, and other performance measurements. Also, the proposed model can classify the source codes based on the defects in codes. We expect that our proposed model will be useful for students, programmers, developers, and professionals as well as other persons involved in overall programming education and other aspects of SE.
The main contributions of this study are described as follows:(i)The proposed intelligent support model can help students and programmers for the source code completion(ii)The intelligent support model detects such errors (logical) that cannot be identified using conventional compilers(iii)The proposed intelligent support model accuracy is approximately 62% which outperformed other benchmark models(iv)Our proposed model can classify the source codes based on the detected errors. The classification accuracy is 96% which is much better than other models(v)The proposed model highlights defective spots with location/line number in source codes(vi)The proposed model improves the ability of learners to fix errors in source code easily by using the location/line numbers
The remainder of the paper is structured as follows. In Section 2, we present the background of our study and discuss prior research. Section 3 describes the overview of natural language processing and artificial neural networks. In Section 4, we present our proposed approach. Data collection and problem description issues are presented in Section 5. The experimental results are described in Section 6. In Section 7, we discuss the results. To that end, Section 8 concludes this paper with some future work proposals.
2. Background and Prior Research
Modern society is flourishing due to advancements in the wideranging fields of information and communication technology (ICT), where programming is a crucial aspect of many developments. Millions of source codes are being created every day, most of which are tested through manual compiling processes. As a result, an important research field that has recently emerged involves the use of AI systems for source code completion during development rather than manual compiling processes. More specifically, artificial neural networkbased models are using for source code completion in order to achieve more humanlike results. Numerous studies have been completed and a wide variety of different methods were proposed regarding the use of AI in SE fields, some of which will be reviewed below.
In [1], the authors present a language model for source code testing that uses a neural network instead of an existing language (i.e., ngram) model. In most cases, ngram language models cannot handle long source code sequences effectively, so neural networkbased language models were developed to improve source code analyses. In the cited study, RNN and LSTM language models were trained and the obtained results showed that the LSTM model performed better than the RNN model. That study used a Java project corpus to evaluate the performance of the language models.
In [4], the authors proposed a novel LSTMbased source code correction method that used segment similarities. More specifically, the study utilized the sequencetosequence (seq2seq) model for the source code correction process. The seq2seq model is a machine learning approach which is very effective for language modelings such as machine translator, conversational model, text translator, and image captioning.
White et al. [7] proposed a deep software language model using RNN. The obtained experimental results using a Java corpus showed that the proposed software language model outperformed conventional models like cachebased ngram and standard ngram. That software language model showed significant promise for use in SE fields.
In [9], the authors presented a novel TreeLSTMbased model where each LSTM unit was used as a tree. That model assesses semantic relatedness prediction tasks based on sentence pairs and sentiment classification. Meanwhile, in [10], the authors proposed a method that classified archived source codes by language type using an LSTM. Their experimental results demonstrated that the proposed LSTM surpassed the linguist classifier, Naive Bayes (NB) classifier, and other similar networks.
In [11], the authors proposed a technique that automatically identifies and corrects source code syntax errors using an RNN. Their proposed SYNFix algorithm finds the error location of the next predicted token sequence, after which the identified error is solved by either replacing or inserting a proper word. A significant limitation of this technique is that it cannot recover or handle multiple syntax errors in a source code sequence.
Pedroni and Meyer [12] studied to find the appropriate type of compiler messages that can assist novice programmers to identify source code errors. And what actions are needed for the error messages? In that study, the authors experimentally showed that certain message types help novice programmers more than others.
In [13], the authors presented a model for source code syntax error correction that was written in the C programming language. That model, called DeepFix, uses a multilayered sequencetosequence neural network combined with an attention mechanism. The authors also proposed a trained RNN that can predict an error with its location number and fix the error with a proper token. The experimental results obtained showed that, out of a total of 6971 source code errors, this approach completely fixed about 27% and partially fixed about 19%.
Rahman et al. [14] proposed a language model using LSTM for fixing source code errors. The proposed model is a combined attention mechanism with LSTM which increases the effectiveness of standard LSTM. Experimental results showed that the model significantly corrected errors in the source codes.
In [15], the authors proposed a source code bug detection technique that works by varying the hyperparameters of an LSTM in order to investigate perplexity issues and training time. The results obtained show that LSTM produces significant results for source code error detection.
Bahdanau et al. [16] proposed a language translation model that uses RNN. More specifically, the encoderdecoder technique is used as a translator when it is necessary to encode a source text into a fixedlength vector. By utilizing the vector length, the decoder can translate the sentences. The paper extends fixedlength limitations by allowing (soft) search from the source sentence to predict a target word instead of using the hard segments of the source code sentences.
Li et al. [17] points out the limitations of neural network language models. To overcome those problems, the authors proposed two new approaches: an attention mechanismenhanced LSTM and a pointer mixture network. The attention mechanismenhanced LSTM is used to alleviate fixedsize vectors and improve memorization capability by providing a variety of ways for gradients to backpropagate. In contrast, the pointer mixture network predicts out of vocabulary (UoV) words by considering locally repeating tokens. That study also proposed the use of an abstract syntax tree (AST) based code completion method.
Li et al. [18] presented a source code defect prediction model using a convolution neural network called DPCNN. Abstract syntax tree (AST) has used to convert the source code into token vectors. Using the word embedding map, each token vector is converted into a numerical vector. CNN used a numerical vector for training. Thereafter, the CNN model creates the source code’s semantic and structural features. Compared with the traditional defect prediction features, DPCNN improved the performance by 12% compared with other stateoftheart models and 16% compared with other traditional feature basis methods.
Dam et al. [19] presented a deep learning model for software defect prediction. The model has used the abstract syntax tree (AST) incorporated in the LSTM network. Each node of the AST structure is treated as an LSTM unit. A deep treebased LSTM model stored syntactical and structural information of source codes for accurate prediction. The learning style of the treebased LSTM model is unsupervised. The model does not clean or replace any erroneous words by predicting correct words. It is used to generate error probability from a source code; thereafter; a classifier identifies the source code’s defect by using the value of probability.
Pham et al. [20] used CNN as a language model based on Feed Forward Neural Network (FFNN). Experimental results demonstrated that the performance of the CNN language model is better than the normal FFNN. As for recurrent models, the CNN language model performs well compared with the RNN, but below the stateoftheart LSTM model.
In [21], the authors proposed an RNNbased model for the source code fault prediction. There are two familiar evaluation methods such as the area under the curve (AUC) and F1measures that are used to measure the performance. The proposed attentionbased RNN model improves the accuracy of source code classification. The AUC and F1measure achieved 7% and 14% more accuracy than the other benchmark models.
In summary, a wide variety of methods and techniques have been proposed in various studies, most of which used RNN, LSTM, or convolutional neural network (CNN) models for source code manipulation and other applications. It is very difficult to explain which proposed research work is superior over other researches. RNNs perform comparatively better than conventional language models, but RNNs have limited ability to handle long source code inputs [7]. An LSTM is a special kind of RNN network that can remember longer source code sequences due to its extraordinary internal structure and thus overcome RNN shortcomings. Our proposed model is unlike the other models. Our proposed LSTMAM network further extends the capabilities of a standard LSTM network to the point where it can be used for detecting and predicting source code errors as well as source code classification. Standard LSTM network only uses the last hidden state to make predictions. In contrast, our LSTMAM network can take the outcomes of all previous hidden states into consideration when making predictions. Therefore, it is a more promising technique for use in source code manipulations than other stateoftheart language models.
3. Overview of Language Model and Artificial Neural Networks
3.1. NGram Language Model
The resources of natural text corpora are being enriched by the accumulation of text from multiple sources on a daily basis. The success behind natural language processing is based on this rich text corpus. For this reason, and because of their simplicity and scalability, ngram models are popular in the field of natural language processing. An ngram model predicts the upcoming word or text of a sequence based on probability, and the probability of an entire word sequence P(, , …, ) can be calculated by using the chain rule of probability.
The Markov assumption, which is used when the probability of a word depends solely upon the previous word, is described in
Thus, the general equation of an ngram used for the conditional probability of the next word sequence is as follows:
In practice, the maximum likelihood can be estimated by many smoothing techniques [22], as shown in the following equation:
Crossentropy is measured to validate the prediction goodness of a language model [23]. Low crossentropy values imply better language models.
3.2. Recurrent Neural Network (RNN)
An RNN is a neural network variant that is frequently used in natural language processing, classification, regression, etc. In a traditional neural network, inputs are processed through multiple hidden layers and then output via the output layer. In the case of sequential dependent input, a general neural network cannot manufacture accurate results. For example, in the case of the dependent sentence “Rose is a beautiful flower,” a general neural network takes the “Rose” input to produce an output based solely on “Rose.” Then, when the word “is” input is considered, the network does not use the previous of “Rose” result. Instead, it simply produces the result using the word “is.” Similarly, a simple neural network takes other words “a,” “beautiful,” and “flower” to generate results without considering the previous result of inputs. To address this problem, RNN has emerged with an internal memory that retains previous time step results. A simple RNN structure is shown in Figure 1.
Mathematically, an RNN can be presented using equation (6). The current state of the RNN can be expressed aswhere h_{t} is the current state, h_{t−1} is the previous state, x_{t} is the current state input, is the weight of the recurrent neuron, and is the weight of the input neuron.
Equation (8) is used as an activation function (tanh) of RNN:
Finally, the output function can be written as follows:where is the weight for the output layer and y_{t} is the output.
RNN has multiple input and output types such as one to one, one to many, many to one, and many to many. Despite all the advantages, RNN is susceptible to the major drawbacks of gradient vanishing or exploding.
3.2.1. Gradient Vanishing and Exploding
In this section, we discuss the RNN gradient vanishing and exploding problems. It seems simple to train the RNN network, but it is very hard because of its recurrent connection. In the case of forward propagation, we multiply all the weight metrics and a similar procedure needs to apply the backpropagation. For the backpropagation, the signal may be strong or weak which causes exploding and vanishing. Gradient vanishing makes a complex situation to determine the direction of model parameters to improve the loss function. On the other hand, exploding gradients make the learning condition unstable. Training of the hidden RNN network is passed through different time steps using backpropagation. The sum of a distinct gradient at every time step is equal to the total error gradient. The error can be expressed by considering total time steps T in the following equation:
Now, we apply the chain rule to calculate the overall error gradients:
The term is involved with the product of Jacobians as shown in the following equation:
The term in equation (12) is evaluated by equation (7).
Now, by the Eigen decomposition on the Jacobian matrix given by we obtain the eigenvalues where and the corresponding eigenvectors are If the direction of a hidden state is moved to by any modifications, then the gradient will be . From equation (14) the product of the Jacobians of the hidden state sequences is . It is easy to visualize the term dominating . In summary, if the greatest eigenvalue is , then the gradient will vanish and causes the gradient exploding [24]. To alleviate the gradient vanishing or exploding problems, the gradient clipping, input reversal, identity initialization, weight regularization, LSTM, etc. techniques can be used.
3.3. LSTM Network
An LSTM neural network is a special kind of RNN network that is often used to process long inputs. An LSTM is not limited to a single input but can also process complete input sequences. Usually, an LSTM is structured with four gates such as forget, input, cell state, and output. Each gate has a separate activity where the cell state keeps complete information of the input sequences and others are used to manage the input and output activities. Figure 2 shows the structure of a basic LSTM unit.
At the very beginning, processing starts with the forget gate to determine which information has to be discarded from (or retain in) the cell state. The forget gate in cell state c_{t−1} can be expressed by the following equation (15) where h_{t−1} is the hidden state and x_{t} is the current input. The output (0 or 1) of the forget gate is produced through a sigmoid function. If the result of the forget gate is 1, then we keep the data in the cell state; otherwise, we discard the data.
The input gate determines which cell state value should be updated when new data appears. Through the tanh function, the candidate value for the cell state is now created.
Now, we update the old cell state c_{t−1} by the c_{t}
The filtered version of the cell state will be output o_{t} via the sigmoid function and the weight will also be updated.
Recognizing the strength of LSTM, we were motivated to apply this network model to error detection, prediction, correction, and classification in source codes.
4. Proposed Approach
Our proposed LSTMAM network has an effective deep learning architecture that is used as an intelligent support model for source code completion. Accordingly, we trained our model using correct source codes and then used it successfully to detect errors and predict correct words in erroneous source codes based on the trained corpus. Moreover, the proposed model can classify the source codes using the prediction results. Our model generates a complete feedback package for each source code after being examined where learners and professionals can benefit from the model. The workflow of our proposed model is depicted in Figure 3.
4.1. Proposed LSTMAM Network
Over the years, attention mechanisms have been adapted to a wide variety of diverse tasks [25–30], the most popular and effective of which is sequencetosequence modeling. Typically, in sequencetosequence modeling, the output of the last hidden state is used as the context vector for further consideration. It is very difficult to process long sequenced inputs using the sequencetosequence model [31]. The attention mechanism makes it possible to map all previously hidden state outputs, including the latest hidden state output, to produce the most relevant and accurate results.
With this point in mind, we incorporated an attention mechanism into a standard LSTM to make LSTMAM, as shown in Figure 4. This strengthens our model’s ability to predict longer source code sequences. Attention usually improves the performance of the language and translator model by merging all hidden state outputs with the softmax function; sometimes, attention mechanisms work as a dense layer. Recently, attention mechanisms have been used in machine translation tasks with great success. Furthermore, sometimes it is necessary for a machine translator model to compress entire input sequences into a smaller size vector, so there is a possibility of information loss. The use of attention mechanisms has fixed this problem. In our proposed model, an attention mechanism is combined with an LSTM. Although the abilities of a standard LSTM to capture longrange dependencies are far superior to those of an RNN. It still encounters problems when a hidden state has to carry all the necessary data in a smallsized vector [31]. The introduction of attention mechanisms and their alignment with neural language models such as LSTM are aimed at overcoming these problems [16]. The attention mechanism offers neural language models to bring and use appropriate information in all secret states of the past. As a result, the network’s retention ability is improved and diverse paths are provided for gradients to backpropagate. More detailed mathematical illustrations of attention mechanisms can be found in [17].
For our attention mechanism, we took the external memory of E for the previous hidden states, which is denoted by . The proposed model used attention layer by considering h_{t} and M_{t} at the time t, attention weight α_{t}, and context vector c_{t}.
To predict the next word at time step t, judgment is based not only on current hidden states h_{t} but also on context vector c_{t}. At that point, the focus turns to the vocabulary spaces to produce the final probability via softmax function. Here, G_{t} is an output vector.where and are trainable projection matrices, is a bias, and is a vocabulary/dictionary size.
Based on the above aspects, we can see that the use of an attention mechanism helps to effectively extract the exact features from input sequences. As such, the use of LSTMAM will increase the capability of our model.
5. Data Acquisition and Problem Description
An online judge (OJ) system is a webbased programming environment that compiles and executes submitted source codes and returns judgments based on test data sets. OJ system is an open platform for programming practice as well as competition. To conduct our experimental work, we collected source code from the AOJ system [32, 33]. Currently, the AOJ system is effortlessly performing for various programming competitions and academies. As of May 2020, about 75,000 users are regularly playing their programming activities on the AOJ platform, with 2100 autonomous problem sets. All problem sets are classified based on different algorithms and branches of computer science [14]. As a result, about 4.5 million massive solution source codes have been archived on the AOJ platform, encouraging better research in the field of software engineering. We used all source codes from the AOJ system for training and testing purposes to avoid threats or difficulties in our proposed model. For model training, we selected all of the correct solutions written in C language of the three problems such as the greatest common divisor (GCD), insertion sort (IS), and prime numbers (PN). There are a total of 2285 correct source codes for the IS problem and the overall solution success rate is 35.16%. The total number of correct source codes for the GCD problem is 1821 and the overall solution success rate is 49.86%. Considering the GCD problem, we see that there are two inputs (a and b) given in a line, after which the greatest common divisor of a and b will be output, as shown in Figure 5(a).
(a)
(b)
In contrast, the total number of correct source codes for the PN problem is 1538 and the overall solution success rate is 30.8%. In the PN problem description, the first line contains an integer N. The code needs to count the number of prime numbers in the list of N elements, as shown in Figure 5(b).
5.1. Data Preprocessing and Training
Before we conducted training, raw source codes were filtered by removing unnecessary elements. To accomplish this, we followed the procedure applied to [14] for source code embedding and tokenization. First, we removed all irrelevant elements from codes like lines (n), comments, and tabs (\t). After that, all the remaining elements of the code were converted into word sequences where numbers, functions, tokens, keywords, variables, classes, and characters were treated as simple words. The whole code transformation process is called tokenization and vocabulary creation. Then, each word was encoded with IDs in which the function names, keywords, variable names, and characters were encoded as listed in Table 1. The flowchart of the training and evaluation process of our model is shown in Figure 6.

At the early stage of the training phase, the source codes were first converted into word sequences and then encoded into token IDs as shown in Figure 7. This conversion process is called word embedding and tokenization.
Upon completion of the embedding and tokenization process, we trained our proposed model and other related stateoftheart models with the correct source codes of IS, GCD, and PN problems. The simple training process of an LSTMbased language model is shown in Figure 8.
At the end of the training process, the next step is to check the performance of the model for the source code completion task. How accurately it identifies errors and predicts corrections? Our proposed model created the probability for each word. We considered a word will be an error candidate whose probability is below 0.1 [14]. Additionally, to test the model loss function, we calculated the crossentropy for each epoch at the softmax layer. Crossentropy is defined as the difference between actual and predicted results. Softmax is an activation function that creates probabilities. Typically, softmax is used as the last layer of neural networks. The output range of the softmax function is between 0 and 1. The softmax layer received x = [x_{1}, x_{2}, x_{3}, x_{4}, …, x_{n}] and returns probability p = [p_{1}, p_{2}, p_{3}, p_{4}, …, p_{n}], as defined in the following equation:
Crossentropy is an effective performance measurement indicator for the probabilitybased model. Lowvalued crossentropy indicates a good model.
A simple example of the prediction process used by our model is shown in Figure 9. An input sequence example is {“ = ,” “x,” “+,” “y”}; then the model calculates the next probable correct word based on the source code corpus. Finally, the word with the highest probability is the winner of the next predicted word. Based on the input sequence in the example above, the correct predicted word is {“;”}.
5.2. Hyperparameters
In the present research, we defined several experimental hyperparameters in order to obtain better results. To avoid overfitting, a dropout ratio (0.3) was used for our proposed model. The LSTM network was optimized using Adam, which is a stochastic optimization method [34, 35]. The learning rate is an important factor for neural network training because the value of the learning rate can control the learning speed of the model. Network learning becomes faster and slower on the basis of higher and lower value of learning rates, respectively. In the present paper, we determine the learning rate l = 0.002, and the network weights during training are updated by the value of l. is the exponential decay rate for the firstmoment estimate and the secondmoment estimate of the exponential decay rate is . The values of the and are 0.001 and 0.999, respectively. The value of chosen to avoid any division by zero which is . We trained our network in 50, 100, 150, and 200 hidden units. Each model type was named with reference to the number of units, such as the 100unit model and 200unit model. After training, we assessed the ability of our proposed LSTMAM technique to pick the best number of hidden units from the created models.
6. Experimental Results
Our proposed intelligent support model can be useful for source code completion. Also, it is a general model and can be adapted to any source code for model training and testing. In our proposed model, we defined a minimum probability value by which the model can identify error candidate words based on the training corpus. Accordingly, we randomly chose some incorrect IS, GCD, and PN source codes and used them to evaluate the models’ performance levels. Here, we should note that all of our research work and language model training were performed on an Intel® Core™ i75600U central processing unit (CPU) personal computer clocked at 2.60 GHz with 8 GB of RAM in a 64bit Windows 10 operating system.
6.1. Hidden Unit Selection and CrossEntropy Measurement
We used several hidden units such as 50, 100, 150, and 200 to train our proposed LSTMAM and other stateoftheart models. In training, the correct source codes of IS, GCD, and PN problems are used separately and all the source codes of IS, GCD, and PN are used combinedly. The number of source codes of each type of problem is listed in Table 2.

We trained our proposed LSTMAM and different stateoftheart models using correct source codes. Table 3 presents the crossentropy in 30 epochs during training using PN source codes. The 50, 100, 150, and 200unit models took a total of 11483, 20909, 38043, and 59065 seconds to train the LSTMAM model using the PN problem, respectively.

Tables 4 and 5 present crossentropy of different models during training using GCD and IS source codes, respectively. The 50, 100, 150, and 200unit models took a total of 19005, 24110, 24273, and 30420 seconds to train the LSTMAM model using the GCD problem, respectively, and for the IS problem, it took a total of 39643, 62756, 80100, and 100803 seconds, respectively. In contrast, other models such as LSTM and RNN took relatively less time for training.


To evaluate the efficiency of the proposed model, epochwise crossentropy during the training periods using the 200unit model was calculated which is depicted in Figure 10.
(a)
(b)
(c)
As mentioned above, the efficiency of a model strongly depends upon the value of crossentropy. During training, the 200unit model produced the lowest crossentropy using each type of problem set. The crossentropy of the 200unit model using IS, PN, and GCD problems is shown in Figure 11.
We aimed to find the bestsuited hidden units for our LSTMAM network and other stateoftheart models. In this regard, we put together all the source codes (about 3442) to train our proposed and other stateoftheart models. The crossentropies and total times are recorded at the last epoch of all the models as presented in Table 6. The crossentropy of the 200unit model is lower than other models.

Based on the above aspect, it is ensured that the 200unit model provides the best results because its crossentropy is the lowest value among all the units; thus, we selected a 200unit model for the LSTMAM network and other stateoftheart networks.
6.2. Error Detection and Prediction
In our evaluations, we tested LSTMAM and other stateoftheart models using erroneous source codes. Probable error locations were marked by changing the text color and underlining the suspected erroneous portions. Also, the proposed model generates error words and predicted words’ probability. Since both the standard LSTM and the LSTMAM networks identified source code errors quite well compared with the RNN and other networks when the 200unit model was used, a 200unit model was selected for use in all of our empirical experiments.
An erroneous source code sequence evaluated by the standard LSTM network is shown in Figure 12. Here, it can be seen that errors were detected in lines 2, 6, 15, and 16. In line 2, the word “a” in the “gcd” function was detected as an error candidate, after which the correct word was predicted to be “)” with a probability 0.62435395. The model decided that the “gcd” function might be without arguments, the word “)” was predicted instead of the word “a”. In line 6, the error word is “if” and the predicted word is “else” with a probability 0.5808078. Additionally, in line 15, the predicted word is “blank space” in the place of “double quotation.” Finally, in line 16, the model detected “c” as an error object and suggested with a high level of probability that it should be replaced by the word “b.” The word “c” is irrelevant within the context of the program; it can be confirmed that the standard LSTM model successfully detected the error candidates shown in Figure 12, as listed in Table 7.

The same incorrect source code was then evaluated by the LSTMAM network, as shown in Figure 13. The error locations are in lines 2, 15, and 16. The word “a” in the “gcd” function was detected as an error candidate and the predicted word “)” was suggested. In line 15, the word “double quotation” was identified as a bug, and the predicted word “blank space” was suggested. The word “c” in line 16 was recognized as an error and the corresponding predicted word suggested was “b” with a probability of 0.9863272, as shown in Table 8.

Another interesting erroneous source code, which exists in some logical errors, was evaluated by the standard LSTM network, as shown in Figure 14. All the detected error words and their corresponding predicted corrections of Figure 14 are listed in Table 9.

Similarly, the same erroneous source code was tested by the LSTMAM network, as shown in Figure 15. The detailed error descriptions of Figure 15 are listed in Table 10, where it can be seen that the LSTMAM network detected all of the potential errors, including the true logical errors, successfully.

6.3. Classification of Source Codes
We evaluated our proposed LSTMAM and other benchmark models using both clean and erroneous source codes. For extensive experiments, we selected several benchmark models to compare classification results such as (i) Random Forest (RF) [36] method, (ii) Random Forest (RF) method trained with secret attributes by Restricted Boltzmann Machine (RBM) [37], and (iii) Random Forest (RF) method learned with secret attributes by Deep Belief Network (DBN) [38].
The precision, recall, and Fmeasure are expressed by equations (23)–(25), respectively, to verify the classification performances:where is called true positive, the case means defective source code classifies as erroneous, and is called false positive; the case means that the clean source code is classified as erroneous. The term is called false negative where means that the erroneous source code is classified as a clean source code. Fmeasure is called the harmonic mean of recall and precision. Generally, we cannot achieve optimal results simultaneously for recall and precision. For example, if all the source codes are classified as defective, the resulting recall score will be 100% where the precision score will be small. Therefore, Fmeasure is a tradeoff between recall and precision. The range of the Fmeasure score is between 1 and 0; the higher score implies a better classification model.
Under normal circumstances, our proposed language model detects all possible errors in source codes where all the detected errors are not true errors (TE). So, we considered only TE for the classification process. An error is called a TE when the predicted probability is more than 0.90. We aligned the term true positive with our proposed model when the model detects TE in erroneous source codes. Again, in case of the term false positive , at least a single TE is detected within correct source codes. Finally, the term false negative , not a single TE, is detected in erroneous source code which means that it classifies the erroneous source code as clean code. As mentioned above, all the models are trained by using correct source codes and tested on 500 randomly chosen source codes from each problem set (IS, GCD, and PN). The classification results are listed in Tables 11–13 for the IS, GCD, and PN source codes, respectively.



In the classification process, the Fmeasure scores of the LSTMAM and other stateoftheart models are shown in Figure 16. Fmeasure results show that the classification performance of our proposed model is better than other methods.
7. Discussion
To assess our proposed intelligent support model, we defined three performance measurement indices such as error prediction accuracy (EPA), error detection accuracy (EDA), and model accuracy (MA), shown in equations (26) to (28). In particular, we evaluated our proposed model and other benchmark models using equation (28).
In most cases, the proposed model detects potential errors in the codes. Among these errors, there are a few original errors called true errors (TE). Similarly, out of the total predicted words, where some of the original correct words are left, they are called True Correct Words (TCW).
In the evaluation process, we discarded the RNN and other benchmark models because they obtained high crossentropies whereas standard LSTM achieved very low crossentropies. Therefore, we validated both the standard LSTM and LSTMAM networks using several randomly chosen erroneous source codes. Figure 12 and Table 7 present the details of error detection and prediction by standard LSTM. The standard LSTM detected errors in lines 2, 6, 15, and 16 and provided the corresponding candidate words “a,” “if,” “double quotation,” and c, respectively. The predicted correct words are “),” “else,” “blank space,” and “b.” Although these results show that the standard LSTM had detected the most probable erroneous words and locations, not all of the candidate errors are true errors (TE). In line 2, the model detects “a” as an error candidate by guessing that “gcd” is a function without arguments. Then, as a consequence, it predicts a close parenthesis “)” as the correct word. Similarly, in line 6, the model detected “if” as a candidate error word and predicted “else” as a corresponding correction. In this case, the model calculated that the word “if” started at line 3 and ended at line 5 and that the word after line 5 should be “else.” As a result, the standard LSTM predicted the word “else” in line 6 instead of the word “if.” However, both the error predictions in lines 2 and 6 were incorrect, even though they were hypothetically reasonable. It should be noted that the error candidate word “c” in line 16 is a true error (TE) and the predicted word “b” is correct. The evaluation results using the standard LSTM for the erroneous source code in Figure 12 are presented in Table 14.

In Figure 13, the LSTMAM model detected a total of three errors in lines 2, 15, and 16, with the predicted corresponding correct words being “),” “blank space,” and “b” respectively, as shown in Table 8. The evaluation results using the LSTMAM for the erroneous source code in Figure 13 are presented in Table 15.

To further evaluate the performance of our proposed model, we then took a somewhat larger and more complex erroneous source code and verified it using both the standard LSTM and LSTMAM networks, as shown in Figures 14 and 15, respectively. The erroneous source code contains a logical error in line 23. In this source code, two inputs were taken from the keyboard as “a” and “b” variables. The higher value was assigned to variable “x” and the lower value was assigned to variable “y.” Initially, variable “x” was thought to be a dividend and variable “y” was designated as a divisor. However, line 23 was checked to find the initial greatest common divisor used by the modular arithmetic operator where the small valued variable “y” was considered to be a dividend, and the higher valued variable “x” was considered to be a divisor. By following the code sequence, the correct logic would be x%y. Based on that aspect, the LSTMAM network identified the logical error correctly by considering the previous source code sequence, whereas the standard LSTM could not detect the logical error in line 23. The evaluation results for erroneous source codes in Figures 14 and 15 are listed in Table 16, where it can be seen that the LSTMAM network performance was even better in the case of the long source codes and complex codes with logical or other errors.

In addition to the abovementioned source code evaluations and examples, we evaluated about 300 randomly chosen erroneous source codes using the LSTM and LSTMAM models and found that their average accuracy values were approximately 31% and 62%, respectively. Those detailed statistics are shown in Table 17.

Unlike the examples used in this study, programs’ lengths can vary widely, with many containing from 500 to 1000 lines of source code, or more. One thing all have in common is that when writing a program, numerous variables and functions may be declared in many lines previously. Therefore, an attention mechanism is needed to capture the longterm source code dependencies, as well as to evaluate source code errors correctly. Our experimental results have shown that the LSTMAM model was much more successful for the longer sequenced source code than was the standard LSTM model, as shown in Figure 17.
Additionally, some syntax and logical errors in source codes cannot be identified by traditional compilers. In such cases, our proposed LSTMAMbased language model can provide meaningful responses to learners and professionals that can be used for the source code debugging and refactoring process. This can be expected to save time when working to detect errors from thousands of lines of source code, as well as to limit the area that must be searched to find the errors. Furthermore, the use of this intelligent support model can assist learners and professionals to more easily find the logical and other critical errors in their source codes. Moreover, the classification accuracy of our proposed model is much better than the other stateoftheart models. The average precision, recall, and Fmeasure scores of the LSTMAM model are 97%, 96%, and 96%, respectively, which outperformed other stateoftheart models.
8. Conclusion
In the present research, we proposed an AIbased model to assist students and programmers in source code completion. Our proposed model is expected to be effective in providing endtoend solutions for programming learners and professionals in the SE fields. The experimental results obtained in this study show that the accuracy of error detection and prediction using our proposed LSTMAM model is approximately 62%, whereas standard LSTM model accuracy is approximately 31%. In addition, our approach provides the location numbers for the predicted errors, which effectively limits the area that must be searched to find errors, thereby, reducing the time required to fix large source code sequences. Furthermore, our model generates probable correction words for each error location and detects logical and other errors that cannot be recognized by conventional compilers. Also, the LSTMAM model shows great success in source code classification than other stateoftheart models. As a result, it is particularly suitable for application to long source code sequences and can be expected to contribute significantly to source code debugging and refactoring process. Despite the abovementioned advantages, our proposed model also has some limitations. For example, error detection and predictions are not always perfect, and the model sometimes cannot understand the semantic meaning of the source code because of the incorrect detection and predictions that have been produced. Thus, our future work will use a bidirectional LSTM neural network to improve this intelligent support model for source code completion.
Data Availability
We acquired all the training and test source codes from the AOJ system. Resources are accessed from the following websites through the API: https://onlinejudge.uaizu.ac.jp/ and http://developers.uaizu.ac.jp/index.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was funded by the JSPS KAKENHI (grant no. 19K12252).
References
 K. H. Dam, T. Tran, and T. Pham, “A deep language model for software code,” 2016, https://arxiv.org/abs/1608.02715. View at: Google Scholar
 M. Monperrus, “Automatic software repair: a bibliography,” ACM Computing Surveys, vol. 51, no. 1, pp. 1–24, 2018. View at: Publisher Site  Google Scholar
 H. Seo, C. Sadowski, S. Elbaum, E. Aftandilian, and R. Bowdidge, “Programmers’ build errors: a case study (at google),” in Proceedings of the 36th International Conference on Software Engineering (ICSE’14), pp. 724–734, Hyderabad, India, May 2014. View at: Publisher Site  Google Scholar
 Y. Pu, K. Narasimhan, A. SolarLezama, and R. Barzilay, “sk_p: a neural program corrector for MOOCs,” in Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity, pp. 3940, Amsterdam, Netherlands, November 2016. View at: Publisher Site  Google Scholar
 J. Pennington, R. Socher, and C. D. Manning, “Glove: global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, Doha, Qatar, October 2014. View at: Publisher Site  Google Scholar
 Z. Tu, Z. Su, and P. Devanbu, “On the localness of software,” in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), pp. 269–280, New York, NY, USA, November 2014. View at: Publisher Site  Google Scholar
 M. White, C. Vendome, M. LinaresVásquez, and P. Denys, “Toward deep learning software repositories,” in Proceedings of the 12th Working Conference on Mining Software Repositories (MSR’15), pp. 334–345, Florence, Italy, May 2015. View at: Publisher Site  Google Scholar
 Y. Bengio, N. BoulangerLewandowski, and R. Pascanu, “Advances in optimizing recurrent networks,” in Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628, Vancouver, Canada, May 2013. View at: Publisher Site  Google Scholar
 K. S. Tai, R. Socher, and C. D. Manning, “Improved semantic representations from treestructured long shortterm memory networks,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1556–1566, Beijing, China, July 2015. View at: Publisher Site  Google Scholar
 J. Reyes, D. Ramírez, and J. Paciello, “Automatic classification of source code archives by programming language: a deep learning approach,” in Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 514–519, Las Vegas, NV, USA, December 2016. View at: Publisher Site  Google Scholar
 S. Bhatia, P. Kohli, and R. Singh, “Neurosymbolic program corrector for introductory programming assignments,” in Proceedings of the 40th International Conference on Software Engineering (ICSE’18), pp. 60–70, Gothenburg, Sweden, May 2018. View at: Publisher Site  Google Scholar
 M. Pedroni and B. Meyer, “Compiler error messages: what can help novices?” in Proceedings of the 39th SIGCSE Technical Symposium on Computer Science Education, pp. 168–172, Portland, OR, USA, March 2008. View at: Publisher Site  Google Scholar
 R. Gupta, S. Pal, A. Kanade, and S. K. Shevade, “DeepFix: fixing common c language errors by deep learning,” in Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence (AAAI17), pp. 1345–1351, San Francisco, CA, USA, February 2017. View at: Google Scholar
 M. M. Rahman, Y. Watanobe, and K. Nakamura, “Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education,” Applied Sciences, vol. 10, no. 8, p. 2973, 2020. View at: Publisher Site  Google Scholar
 Y. Teshima and Y. Watanobe, “Bug detection based on lstm networks and solution codes,” in Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3541–3546, Miyazaki, Japan, October 2018. View at: Publisher Site  Google Scholar
 D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proceedings of the 3rd International Conference on Learning Representations (ICLR), pp. 1–15, San Diego, CA, USA, May 2015. View at: Google Scholar
 J. Li, Y. Wang, M. R. Lyu, and I. King, “Code completion with neural attention and pointer networks,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18)., pp. 4159–4165, Stockholm, Sweden, July 2018. View at: Publisher Site  Google Scholar
 J. Li, P. He, J. Zhu, and M. R. Lyu, “Software defect prediction via convolutional neural network,” in 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 318–328, Prague, Czech Republic, July 2017. View at: Publisher Site  Google Scholar
 H. K. Dam, T. Pham, S. W. Ng et al., “Lessons learned from using a deep treebased model for software defect prediction in practice,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 46–57, Montreal, Canada, May 2019. View at: Publisher Site  Google Scholar
 N.Q. Pham, K. German, and G. Boleda, “Convolutional neural network language models,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1153–1162, Austin, TX, USA, November 2016. View at: Publisher Site  Google Scholar
 G. Fan, X. Diao, H. Yu, Y. Kang, and L. Chen, “Software defect prediction via attentionbased recurrent neural network,” Scientific Programming, vol. 2019, Article ID 6230953, 14 pages, 2019. View at: Publisher Site  Google Scholar
 S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL’96), pp. 310–318, Santa Cruz, CA, USA, June 1996. View at: Publisher Site  Google Scholar
 M. Allamanis and C. Sutton, “Mining source code repositories at massive scale using language modeling,” in Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13), pp. 207–216, San Francisco, CA, USA, May 2013. View at: Publisher Site  Google Scholar
 R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” 2012, https://arxiv.org/abs/1211.5063. View at: Google Scholar
 V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” in Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), pp. 2204–2212, Montreal, Canada, December 2014. View at: Google Scholar
 T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1412–1421, Lisbon, Portugal, September 2015. View at: Publisher Site  Google Scholar
 J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.S. Chua, “Attentive collaborative filtering: multimedia recommendation with item and componentlevel attention,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17), pp. 335–344, Shinjuku, Japan, August 2017. View at: Publisher Site  Google Scholar
 X. Ran, Z. Shan, Y. Fang, and C. Lin, “An LSTMbased method with attention mechanism for travel time prediction,” Sensors, vol. 19, no. 4, p. 861, 2019. View at: Publisher Site  Google Scholar
 Y. Yoshizawa and Y. Watanobe, “Logic error detection system based on structure pattern and error degree,” Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 5, pp. 1–15, 2019. View at: Publisher Site  Google Scholar
 T. Matsumoto and Y. Watanobe, “Towards hybrid intelligence for logic error detection,” Advancing Technology Industrialization Through Intelligent Software Methodologies, Tools and Techniques, vol. 318, pp. 120–131, 2019. View at: Google Scholar
 J. Cheng, L. Dong, and M. Lapata, “Long shortterm memorynetworks for machine reading,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 551–561, Austin, TX, USA, November 2016. View at: Publisher Site  Google Scholar
 Y. Watanobe, “Aizu Online Judge,” 2018, https://onlinejudge.uaizu.ac.jp/. View at: Google Scholar
 Aizu Online Judge, “Developers site (API),” 2004, http://developers.uaizu.ac.jp/index. View at: Google Scholar
 N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. View at: Google Scholar
 D. P. Kingma and B. Jimmy, “Adam: a method for stochastic optimization,” in Proceedings of the 3rd International Conference for Learning Representations (ICLR), pp. 1–13, San Diego, CA, USA, May 2015. View at: Google Scholar
 V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston, “Random forest: a classification and regression tool for compound classification and QSAR modeling,” Journal of Chemical Information and Computer Sciences, vol. 43, no. 6, pp. 1947–1958, 2003. View at: Publisher Site  Google Scholar
 I. Sutskever, G. E. Hinton, and G. W. Taylor, “2e recurrent temporal restricted Boltzmann machine,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 1601–1608, Vancouver, Canada, December 2009. View at: Google Scholar
 G. Hinton, “Deep belief networks,” Scholarpedia, vol. 4, no. 5, p. 5947, 2009. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 Md. Mostafizer Rahman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.