Abstract

Knowledge tracing (KT) is the task of modelling students’ knowledge state based on their historical interactions on intelligent tutoring systems. Existing KT models ignore the relevance among the multiple knowledge concepts of a question and characteristics of online tutoring systems. This paper proposes a neural Turing machine-based skill-aware knowledge tracing (NSKT) for conjunctive skills, which can capture the relevance among the knowledge concepts of a question to model students’ knowledge state more accurately and to discover more latent relevance among knowledge concepts effectively. We analyze the characteristics of the three real-world KT datasets in depth. Experiments on real-world datasets show that NSKT outperforms the state-of-the-art deep KT models on the AUC of prediction. This paper explores details of the prediction process of NSKT in modelling students’ knowledge state, as well as the relevance of knowledge concepts and conditional influences between exercises.

1. Introduction

With the development of intelligent tutoring systems (ITSs) and the emergence of massive open online courses (MOOCs) [1, 2], knowledge tracing plays an important role in improving the efficiency of personalized learning platforms. Knowledge tracing is the task of modelling students’ knowledge state based on their historical interactions to predict students’ mastery of knowledge concepts (KCs), where a KC can be an exercise, a skill, or a concept [3, 4].

In order to better model students’ knowledge state, various knowledge-tracing models have been proposed. In previous studies, Bayesian knowledge tracing (BKT) is a powerful knowledge-tracing model. BKT models students’ knowledge concept state by using a hidden Markov model (HMM) for each KC [5].

As deep learning develops, a lot of deep learning models have been applied in KT. Chris Piech applies the recurrent neural network (RNN) to model the student learning process for the first time and proposes deep knowledge tracing (DKT) [69]. The dynamic key-value memory network (DKVMN) uses a static memory called key and a dynamic memory called value to discover latent relations between exercises and knowledge concepts [10, 11]. Self-attentive knowledge tracing (SAKT) proposes a self-attention-based KT model to model the students’ knowledge state, with exercises as attention queries and students’ past interactions as attention keys/values [3, 1215].

However, the aforementioned works only focus on students’ exercise interactions and ignore the relations between questions and skills. It cannot model students’ knowledge state accurately by merely focusing on students’ interactions. Knowledge tracing models begin to pay attention to the structure of the knowledge concepts [1618].

Deep hierarchical knowledge tracing models students’ knowledge state by capturing the hierarchical structure of questions and knowledge concepts [16]. Neil Heffernan’s latest work considers the question information to which the knowledge concept belongs [17]. Graph-based knowledge tracing considers the influence among neighboring knowledge concepts [1922]. The bipartite graph is an effective structural model to capture latent relations between questions and skills [18]. This method is effective, but the amount of calculation is huge because it needs to extract questions and skills, respectively. Thus, it is difficult to be regarded as a streamlined and effective knowledge-tracing model.

None of the above KT models make full use of the multiknowledge concept information of the questions. Existing knowledge tracing models cannot capture latent relations between questions and concepts concisely and effectively. We know that questions are generally composed of multiple knowledge concepts, which are actually closely related. In order to better model the students’ learning process, our model is constructed by using neural Turing machines (NTMs), which are an instance of memory-augmented neural networks (MANNs) that have a large external memory capacity [2325]. Therefore, on the basis of above deep knowledge tracing models, we propose an NTM-based skill-aware knowledge-tracing model. The highlight of our work is to utilize the knowledge concept composition information of questions to model the students’ knowledge state more accurately and to discover more latent relevance among knowledge concepts effectively. The contributions of this paper are concluded as follows:(i)We process the real-world KT datasets in detail and discover new characteristics of online tutoring systems and knowledge tracing datasets.(ii)We design a question-skill dictionary algorithm to obtain the conjunctive skills of questions. The input encoding contains both students’ answering interaction information and the related knowledge concept information.(iii)We apply neural Turing machines into knowledge tracing innovatively to enhance the memory capacity of our model and to predict students’ mastery of knowledge concepts accurately and discover knowledge concept substructure effectively.(iv)We propose a novel NTM-based skill-aware knowledge-tracing model for conjunctive skills and apply a novel loss optimization function to deep knowledge tracing to enhance the model’s ability of skill awareness. Our model considers the conjunctive knowledge concept information contained in a question in the process of modelling the students’ knowledge state; thus, our model outperforms existing KT models.

The rest of this paper is organized as follows: Section 2 presents a brief overview of related work in the field of knowledge tracing. In Section 3, we formulate the process for NSKT to perform the knowledge-tracing task. Then, Section 4 introduces the characteristics and classifications of online tutoring systems. The details of the NSKT model are provided in Section 5. The experimental results and the comparison of models’ performance in the real-world datasets are given in Section 6. In Section 7, we discuss in detail the process of NSKT in modelling the students’ knowledge state. Section 8 presents the conclusions and future studies of this work.

In this section, we present a brief overview of the models and methods of related work in the field of knowledge tracing, which can be classified into two main categories, as shown in Table 1.

2.1. Item Response Theory

Item response theory is the most commonly used cognitive model to predict students’ mastery of knowledge concepts before knowledge tracing was proposed in 1995 [26, 27]. On the basis of IRT, the students’ knowledge state cognitive model based on factor analysis was later proposed: LFA [28] and PFA [29]. These logistic regression models predict students’ mastery of knowledge concepts by analyzing the relationship among factors that have an impact on students’ answering accuracy [30, 31].

2.2. Knowledge Tracing

Bayesian knowledge tracing (BKT) models the students’ knowledge state by using the hidden Markov model (HMM) for a single knowledge concept, which is represented as a set of binary latent variables [5].

With the rise of deep learning, deep knowledge tracing (DKT) was proposed in [6], which regards students’ historical interactions as time sequences and models the students’ knowledge state by the recurrent neural network (RNN). The experimental results show that DKT has the powerful ability of modelling the students’ knowledge state. After DKT, a lot of deep KT models have been proposed to improve the AUC of the prediction of students’ mastery of knowledge concepts. However, most of these deep knowledge-tracing models only focus on students’ interactions on knowledge concepts and ignore the structural relationship between questions and knowledge concepts.

2.3. Question-KC Relation in Knowledge Tracing

Cen et al. proposed the two IRT models (additive factor model (AFM) and conjunctive factor model (CFM)) to model the conjunctive skills in the student datasets [32]. Both the AFM and CFM consider the conjunctive skills information contained in an item to predict the probability of students answering the item correctly.

Deep hierarchical knowledge tracing begins (DHKT) to focus on the hierarchical relationship between knowledge concepts and questions to predict the performance of students [16]. DHKT trains a question embedding by the average embeddings of the skills belonging to the question. The model using the bipartite graphs can capture relationships between knowledge concepts and questions effectively and systematically to pretrain question embeddings for each question [18]. Neil Heffernan’s latest work begins to focus on the architecture of knowledge concepts and questions too [17].

3. Problem Formulation

Generally, KT can be formulated as a supervised sequence learning problem: the student’s interaction tuple at the timestamp , that represents the combination of which skill (exercise) was answered and if the skill was answered correctly, so , , where is the number of unique exercises in datasets. Given the student’s past exercise interactions, , the goal of KT is to predict the probability that the student will answer question correctly at the next timestamp , [3, 6, 10].

It can be seen that existing KT models only focus on students’ exercise interactions, so they are difficult to predict students’ mastery of skills effectively. The notations used in this paper are shown in Table 2.

Definition 1. Related knowledge concepts (RKCs): the related knowledge concepts (RKCs) refer the other knowledge concepts that compose the question with a knowledge concept , where and are mutual conjunctive knowledge concepts (skills).
The Algorithm 1 processes the skills and the questions of the dataset to obtain a dictionary with the question number as the key and conjunctive skills of the question as the value, while conjunctive skills are the skills that make up the same question. The time complexity of Algorithm 1 is . In this paper, we use KC shown in Table 2 to represent skill. Let be the RKCs related to KC of the answering question , where is illustrated in Figure 1(a).

Input: question list ; skill list ; dictionary ; dataset composed of skills and questions.
Output:
(1)
(2)
(3)
(4)for eachdo
(5)ifnot inthen
(6)  
(7)  
(8)end
(9)end
(10)for eachdo
(11)
(12)
(13)for eachdo
(14)  ifandnot inthen
(15)   
(16)   
(17)  end
(18)end
(19)
(20)end
(21)return
The skill-aware knowledge tracing model can be formulated as follows: the student’s interaction at the timestamp , , where is the correctness to the question on skill , are the of RKCs of KC , is the correctness to RKCs . Given the student’s past interactions, , we can predict the probability that the student will answer next KC correctly at the timestamp , or predict students’ mastery of holistic knowledge concepts, .

4. Online Tutoring Systems

The online tutoring systems can be classified into two categories:

4.1. Question-Level Online Tutoring Systems

In question-level online tutoring systems, students answer the question directly. If the question is answered correctly or incorrectly, all KCs (skills) of the question are answered correctly or incorrectly too. So if a student has answered correctly or incorrectly, then they must answer the RKCs correctly or incorrectly too, which is illustrated in Figure 1(b). Because and are from the same question, so in question-level online tutoring systems, for a student’s interaction at the timestamp : ,

4.2. Skill-Level Online Tutoring Systems

The question-answering situation in skill-level online tutoring systems is much more complicated than that of the question-level online tutoring system. Students can individually answer one of the skills in the question and can answer this skill once or multiple times. So if a student answers correctly, it does not mean that the student must answer correctly, which is shown in Figure 1(c).

Superficially, there is no obvious answering correctness relationship between skill and the related skill set . However, there are a large number of students answering examples shown in Table 3 in skill-level online tutoring systems, indicating that if a student answers incorrectly many times, even if he finally answers correctly, which demonstrates that his mastery of skill is very poor, and similarly, he has poor mastery of . It is very likely that he will answer ’s-related skills incorrectly. So the student’s mastery of , and the student’s mastery of , are close:

This finding is strongly supported by the actual responses of students in skill-level online tutoring systems. So in skill-level online tutoring systems, according to formula (2), we can assume.as shown in Table 4.

5. Method

In this section, we will give a detailed introduction of our NSKT framework, of which, the overview architecture is given in Figure 2.

5.1. Model

The model consists of an encoding layer and a neural network layer. In order to better model the students’ knowledge state, the model is constructed with the neural Turing machine, which is an instance of memory-augmented neural networks (MANNs) that offer the ability to quickly encode and retrieve new information [23].

5.2. Input Features
5.2.1. Answer Information Encoding

Let be the encoding of the student’s interaction tuple , thus :

5.2.2. RKC Information Encoding

The information of the set of RKCs related to KC is encoded with a length of : :

5.3. Neural Turing Machines

Neural Turing machines are an instance of memory-augmented neural networks (MANNs) that extend the capabilities of neural networks by coupling them to external memory resources. Experiments show that neural Turing machines have stronger memory capabilities than the LSTM [23], which is very suitable for modeling the students’ knowledge state [3335]. Figure 3 shows a high-level diagram of the neural Turing machine architecture.

As can be seen from Figure 3, the NTM is composed of 4 modules: controller, read heads, write heads, and memory. The controller can be a feed-forward neural network or a recurrent neural network [23, 34] and has read and write heads that access the external memory matrix.

5.4. Reading

Let be the external memory content which is a memory matrix at the timestamp , where is the number of memory locations and is the vector dimension at each memory location. The elements of , which is a vector of weightings over the locations emitted by a read head at the timestamp , obey the following constraints:

Let be the read vector of a length returned by the head at the timestamp :

5.5. Writing

The memory matrix at the timestamp is modified by the erase vector and the add vector :

5.6. Addressing Mechanisms
5.6.1. Focusing on Content

Each head produces a length key vector that is used to compute the normalised weighting as follows:where is a positive key strength generated by the controller and the similarity measure is cosine similarity:

5.6.2. Focusing on Location

The location-based addressing mechanism is designed to facilitate both simple iterations across the locations of the memory and random-access jumps. It does so by implementing a rotational shift of a weighting as follows [23].

Firstly, the interpolation gate is used to blend between the weighting and the weighting :

Furthermore, the model uses a one-dimensional convolution shift kernel to convolve the current weighting :where is the shift weighting generated by the controller.

To correct the blur that occurs due to the convolution operation, each head emits one further scalar whose effect is to sharpen the final weighting as follows:

5.7. Controller

The NTM controller in our model is the long short-term memory network [36], which can be formulated by the formulas as follows:

are the activation matrices of the input gate, the forget gate, the output gate, the memory cell, and the hidden state matrix, respectively. w and b are the weight matrix and the bias vector of the corresponding gate, respectively. denotes the Hadamard product. and denote the sigmoid and hyperbolic tangent function, respectively:let be the output of the last neural network of the NSKT model, the student’s mastery of knowledge concepts predicted by the model at the timestamp iswhere .

5.8. Optimization

The loss function of the model consists of two parts, namely, the answering interaction loss and the related knowledge concept information loss . Let be the binary cross entropy loss:

We optimize the average cross entropy loss of the student’s interactions as follows:where is the one-hot encoding of KC at the timestamp , is the total number of the student’s interactions, and T denotes transpose operation.

The average cross-entropy loss of the related knowledge concept information iswhere , is the correctness to skill .

The loss for a single student is represented by , which is as follows:where the hyperparameter is the coefficient that determines the proportion of the answering information loss and the related information loss. We use an optimizer to optimize our model. Let be the minimum of , thus, the training objective of NSKT is as follows:

5.9. Skill Awareness

The student’s past interactions in online tutoring systems: , where denotes that the student interaction tuple at the timestamp . The set of knowledge concepts that students have answered actually so far is represented as follows:

The set of knowledge concepts (skills) answered by NSKT so far is represented as follows:

As shown in Figure 4, when the student answers the next skill at the next timestamp , even if the student has not answered questions related to skill before, , but if NSKT has awareness of skill so far, , NSKT can predict the student’s mastery of skill accurately.

6. Experiments

In this section, we give a detailed explanation of datasets and experiments conducted to evaluate the performance of the NSKT model and other KT models in three real-world open-source knowledge tracing datasets.

6.1. Datasets

To evaluate KT models’ performance, we use three datasets collected from online learning platforms. These three datasets are widely used real-world datasets in KT.(i)ASSISTments2009 (https://sites.google.com/site/assistmentsdata/home/2009-2010-assistment-data) (ASSIST09) is provided by the ASSISTment online tutoring platform and is the most widely used dataset in knowledge tracing.(ii)ASSISTments2017 (https://sites.google.com/view/assistmentsdatamining/dataset/) (ASSIST17) is provided by the 2017 ASSISTments data mining competition and is the latest ASSISTments dataset with the most student responses.(iii)EdNet (https://github.com/riiid/ednet) is the dataset of all student-system interactions collected over 2 years by Santa, a multiplatform AI tutoring service with more than 780 K users in Korea available through Android, iOS, and Web [37]. We conducted our experiments on EdNet-KT1 which consists of students’ question-solving logs and is the record of Santa collected since April 2017 by following the question-response sequence format.

The complete statistical information for the three datasets is shown in Table 5.

The details about the columns in datasets are shown as follows:ASSISTments:(i)user_id: the ID of the student(ii)problem_id: the ID of the problem(iii)skill_id: the ID of the skill associated with the problem(iv)1: Correct on the first attempt 0: Incorrect on the first attempt,EdNet:(i)user_id: the ID of the student.(ii)question_id: the ID of the question.(iii)tags: the expert-annotated tags for each question.(iv)correct_answer: the correct answer of each question recorded as a character between a and d inclusively.(v)user_answer: the answer that the student submitted was recorded as a character between a and d inclusively.

6.2. Dataset Characteristics
(i)ASSIST09 and EdNet: For multiple skill questions, the records of students’ interactions will be repeated with different skill taggings and each record represents the student response to a skill of the question [38].(ii)ASSIST17: similar to the ASSIST09 dataset, each record in ASSIST17 represents the student response to a skill of the question. However, we noticed the special features of this dataset. A large number of users in the ASSIST17 dataset only answered one skill of multiple skill questions and answered this skill one or more times. The number of multiple skill questions in this situation accounted for 44.88% of the total number of questions answered by students. That is, the student answers one or more skills of multiple skill questions, and the number of responses to a skill may be given once or multiple times.
6.3. Compared Models and Implementation Details

To show the performance of our model and demonstrate the improvement of our model to existing KT models, we compare NSKT against the state-of-the-art KT models. We give the reference GitHub repositories of some KT models.(i)BKT [5]: Bayesian knowledge tracing uses the hidden Markov model (HMM) to model the students’ latent knowledge state as a set of binary variables. We use pyBKT (https://github.com/CAHLR/pyBKT) to implement BKT and set the model parameters: .(ii)DKT-LSTM [6]: the DKT-LSTM is the standard deep knowledge-tracing mode. We implemented DKT (https://github.com/chrispiech/DeepKnowledgeTracing) with the LSTM with tanh activation.(iii)DKT-NTM: DKT is implemented by using the neural Turing machine (https://github.com/MarkPKCollier/NeuralTuringMachine).(iv)DKVMN [10]: the DKVMN (https://github.com/jennyzhang0215/DKVMN) is a variation of MANNs, which uses a static memory called key and a dynamic memory called value to model the students’ knowledge state.(v)SAKT [3]: SAKT (https://github.com/shalini1194/SAKT) is the KT model based on the self-attention architecture with exercises as attention queries and students’ past interactions as attention keys/values.(vi)DSKT: the skill-aware deep knowledge-tracing model is implemented with the LSTM and tanh activation. We dynamically set the value of the coefficient to explore the best performance of DSKT.(vii)NSKT: NSKT is an NTM-based skill-aware knowledge tracing. We test the performance of NSKT with different values of the coefficient s to optimize the model’s performance.

For all models, we use the Adam optimizer with , , , and to optimize. The minibatch size and the maximum length of the sequence for all datasets are set to 32 and 100, respectively. We perform standard five-fold cross-validation to evaluate all the KT models in this paper. We conduct experiments on the server with an 8-core 2.50 GHz Intel(R) Xeon(R) Platinum 8163 CPU and 64 GB memory.

6.4. Experimental Results
6.4.1. Models’ Performance

We use the area under the receiver operating characteristic curve (AUC) as an evaluation metric to compare prediction performance among the KT models mentioned in Section 6.3. A higher AUC indicates better performance. The test AUC results in the three real-world datasets for all KT models are shown in Table 6. From the experiment results, we can find the following observations:(i)NSKT performs better than the other competing KT models in all datasets and achieves the average test AUC of 85.38%, 82.35%, and 80.81% in ASSIST09, ASSIST17, and EdNet, respectively.(ii)DSKT performs better than the DKT-LSTM, achieves the average test AUC of 84.88%, 81.27%, and 79.71% in datasets ASSIST09, ASSIST17, and EdNet, respectively, gaining an average performance improvement of 0.82% (DKT-LSTM achieves the AUC of 84.45%, 80.04%, and 78.91%). NSKT performs better than the DKT-NTM, gaining an average performance improvement of 1.33% (DKT-NTM achieves the AUC of 84.53%, 80.51%, and 79.49%).(iii)The DKT-NTM model has a better performance than the standard DKT-LSTM in knowledge tracing. The DKT-NTM achieves the average test AUC of 84.53%, 80.51%, and 79.49% in the three datasets, respectively, while the standard DKT-LSTM achieves the average test AUC of 84.45%, 80.04%, and 78.91% in the three datasets, respectively.(iv)The performance of NSKT is better in dataset ASSIST17, which has more complex data features than those of ASSIST09 and EdNet. NSKT gains an average performance improvement of 2.31% in ASSIST17 compared to the standard DKT-LSTM while improving AUC by 0.93% and 0.90% in ASSIST09 and EdNet, respectively. It proves that NSKT is better in mining hidden information from complex educational data features to improve the accuracy of prediction.

Figure 5 shows the training process of KT models in the three KT datasets. It shows that the DKVMN and SAKT can learn faster than other KT models. The training speed of the DKT-LSTM, DKT-NTM, DSKT, and NSKT is close, but the test AUC of NSKT is the best.

We set the probability to KC predicted by KT models: , and assume that students will answer KC correctly if and if , the student will answer incorrectly:

If , it means that models can predict correctly. Thus, the accuracy of prediction for KT models in the datasets is shown in Figure 6.

Figure 7 shows the performance of DSKT and NSKT under different values and the value of when models achieve the best performance. From Figure 7, we can draw the following conclusions: the test AUC of DSKT and NSKT is not ideal with a small value. However, as the value of increases, the test results of DSKT and NSKT get better and better; thus, we recommend .

6.5. Friedman-Aligned Rank Test

We perform the Friedman-aligned rank test [39] on the AUC test results of the KT models shown in Table 6 by the following formula:where is the sum of the ranks of the i-th sample, is the number of groups of samples, and is the number of samples in each group. The probability distribution of can be approximated by that of the chi-squared distribution with degrees of freedom . Now, we test the null hypothesis, which is as follows: H0: there is no significant difference in the performance of the KT models.

The P value of the Friedman-aligned rank test on test AUC results is

Then, we reject the null hypothesis , which indicates a significant difference in the performance of the KT models.

6.6. Execution Time

We compared the execution time of KT models per 200 batches in each dataset shown in Figure 8. As shown in Figure 8, the BKT model requires the least execution time to train the same size of data. This is because the BKT is not a deep learning knowledge tracing model, and it needs to train fewer parameters. For deep learning knowledge tracing models, the execution times of the DKT-LSTM, DKVMN, and SAKT are close and the execution times of the DKT-NTM and DSKT are close. The execution time of the DKT-NTM is more than that of the DKT-LSTM. The reason can be that the NTM takes more time to access its own external memory matrix. NSKT considers the conjunctive skills of the questions during the training process and needs to access the NTM’s external memory matrix to enhance the memory ability of the model. Hence, NSKT has the most execution time, but this is also the reason why NSKT performs better in modelling the students’ knowledge state.

The experimental results show that the NTM-based skill-aware knowledge-tracing model has a strong ability to capture the relevance among knowledge concepts and can enhance the model’s ability of skill awareness for conjunctive skills and improve the accuracy of prediction in modelling the students’ knowledge state. Experiments demonstrate that NSKT is effective.

7. Discussion

In this section, we discuss the details of the prediction process of the KT model in modelling the students’ knowledge state, as well as the relevance of knowledge concepts and conditional influence between exercises.

7.1. Prediction Process

In our opinion, an excellent KT model not only can predict the probability that students will answer questions correctly at the next timestamp accurately but also can perform well in modelling the students’ holistic knowledge concept state.

Analyzing the prediction process of KT models can show the performance of NSKT. We randomly select a student sample from the ASSIST09 dataset, and the detailed process of DKT and NSKT modelling ’s knowledge state is shown in Figure 9.

It can be seen from Figure 9(a) that although DKT performs fairly well in prediction, DKT only focuses on the knowledge concepts to be predicted at the next timestamp and does not care about the ’s mastery of other knowledge concepts. Therefore, after answers correctly at the timestamp , the model’s predicted probability of decreases rapidly, indicating that ’s mastery of is getting worse and worse, which should not be consistent with the ’s real knowledge state shown in Table 7. Because of lacking related knowledge concept (RKC) information, DKT’s prediction accuracy and prediction breadth are not ideal.

As shown in Figure 9(b), we use two heatmap subfigures to show the process of modelling the ’s knowledge state on NSKT. The x-axis of the lower subfigure is the sequence of ’s interactions and the y-axis is the skill index. The x-axis of the upper subfigure is the RKC and the y-axis is the index of the RKCs .

Because answers skill 32 (abbreviated as ) correctly in the first three timestamps , the predicted probability of gets higher and higher and the color of in the y-axis of the lower subfigure gets brighter and brighter. As shown in the x-axis of the upper subfigure, is the related knowledge concept of in the first three timestamps ; thus, the predicted probability of gets higher and higher and the color of the in the y-axis of the upper subfigure gets brighter and brighter too.

In the next three timestamps , answers s33 correctly in succession, the predicted probability of gets higher and higher and the color of in the y-axis of the lower subfigure gets brighter and brighter. is the related knowledge concept of , so the predicted probability of continues to increase, and the color of in the y-axis of the upper subfigure gets brighter and brighter too and remains at a relatively high value.

In the next three timestamps , continues to answer correctly ; however, this is a single skill without related knowledge concepts, so only the predicted probability of gets higher and higher and the color of in the y-axis of the lower subfigure gets brighter and brighter.

At the last timestamp , answer correctly , so the predicted probability of gets higher and higher and the color of in the y-axis of the lower subfigure gets brighter and brighter. Because is the related knowledge concept of , so the predicted probability of gets higher and higher and the color of in the y-axis of the upper subfigure gets brighter and brighter too.

In contrast, we randomly select a student sample with a low answering accuracy shown in Table 8.

The process of DKT and NSKT modelling the ’s knowledge state is shown in Figure 10. It can be seen from Figure 10(a) that DKT models the ’s knowledge state almost accurately, but the prediction breadth is not enough.

As shown in Figure 10(b), NSKT, like DKT, models the ’s knowledge state accurately and performs better in prediction breadth. At the timestamp , answers incorrectly many times, the predicted probability of gets lower and lower and the color of in the y-axis of the lower subfigure gets darker and darker. As shown in the x-axis of the upper subfigure, is the related knowledge concept of ; thus, the predicted probability of gets lower and lower and the color of in the y-axis of the upper subfigure gets darker and darker too.

It can be concluded from Figures 9 and 10 that NSKT performs better in prediction accuracy and prediction breadth and can better model the students’ knowledge state. NSKT not only focuses on students’ mastery of the knowledge concept to be predicted at the next timestamp but also focuses on the students’ mastery of the related knowledge concepts. This is where NSKT is superior to other existing KT models, and NSKT performs better in modelling the students’ knowledge state than DKT [4].

7.2. Pearson Correlation Coefficient

In this paper, we use the Pearson correlation coefficient as the metric to measure the correlation among skills. By estimating the covariance and standard deviation of the sample, we can get the sample Pearson coefficient :

Figures 11 and 12 show the comparison of skill Pearson correlations of ’s interactions and ’s interactions on DKT and NSKT, respectively. Figures 11(a) and 12(a) show the skill Pearson correlation on DKT, and Figures 11(b) and 12(b) show the skill Pearson correlation on NSKT. It can be seen from the figures that DKT can only mine the correlation among the skills that have been answered in the past, indicating that DKT cannot effectively discover the relevance among knowledge concepts. As shown in Figures 11(b) and 12(b), NSKT can discover the correlation among four skills, while DKT can only discover among three. For example, it can be seen from Figure 11(b) that the Pearson correlation between and on NSKT of ’s interactions is , which means there is a weak positive correlation between and .

The Pearson correlation between and on NSKT of ’s interactions is , which means there is a strong positive correlation between and . Through the above examples, we can conclude that NSKT performs better in the ability of discovering latent relevance among knowledge concepts than existing KT models.

7.3. Knowledge Concepts’ Discovery

NSKT can learn latent knowledge concept substructure among skills without expert annotations and can cluster related skills into a cluster, which denotes a knowledge concept (KC) class [6].

Figure 13 shows the visualization of using k-means to cluster the skill representation vectors, which have been performed by the t-SNE method [40, 41]. All skills are clustered into eight clusters, and each cluster can represent a knowledge concept class. Skills in the same cluster are labeled with the same color, and those skills have strong relevance and similarity. For example, and do have a strong relevance and similarity because they are very close in Figure 13, which further proves that NSKT has a stronger ability of discovering skill latent relevance information than existing KT models.

We have explored latent conditional influence between exercises bywhere is the correctness probability assigned by NSKT to exercise when exercise is answered correctly in the first time step [6]. We have shown a latent conditional influence relationship among the exercises corresponding to Figure 9(b) interactions. We have marked them with arrow symbols in Figure 13. The line width indicates connection strength, and nodes may be connected in both directions. We only show edges with an influence threshold greater than 0.08. Attached ASSIST09 skill maps are shown in Figure 13 (we only show 110 skills with the skill name).

8. Conclusion

In this work, we proposed a novel NTM-based skill-aware knowledge-tracing model for conjunctive skills, which can capture the relevance among the multiple knowledge concepts of questions to predict students’ mastery of knowledge concepts (KCs) more accurately and to discover more latent relevance among knowledge concepts effectively. In order to better model the students’ knowledge state, we adopt the neural Turing machines, which use the external memory matrix to augment memory ability. Furthermore, NSKT relates knowledge concepts (KCs) to related knowledge concepts (RKCs) as a whole to enhance the model’s ability of skill awareness and improve prediction accuracy and prediction breadth. Experiments in the real-world KT datasets demonstrate that the NTM-based knowledge concept skill-aware knowledge-tracing model (NSKT) outperforms existing state-of-the-art KT models in modelling the students’ knowledge state and discovering latent relevance among knowledge concepts.

For future studies, we will focus on mining hidden associations among knowledge concepts and building students’ personalized answering paths in intelligent tutoring systems. Furthermore, we will construct the holistic structure of knowledge concepts to enhance students’ understanding of how the overall knowledge affects each other.

Data Availability

The datasets used to support the findings of this study are included within the article and are available from the corresponding author on reasonable request too.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the CCF-AFSG Research Fund (Grant number: CCF-AFSG RF20200014) and the Science and Technology Project of Gansu (Grant numbers: 21YF5GA102, 21YF5GA006, 21ZD8RA008).