Abstract

In recent decades, predicting the performance of students in the academic field has revealed the attention by researchers for enhancing the weaknesses and provides support for future students. In order to facilitate the task, educational data mining (EDM) techniques are utilized for constructing prediction models built from student academic historical records. These models present the embedded knowledge that is more readable and interpretable by humans. Hence, in this paper, the contributions are presented in three folds that include the following: (i) providing a thorough analysis about the selected features and their effects on the performance value using statistical analysis techniques, (ii) building and studying the performance of several classifiers from different families of machine learning (ML) techniques, (iii) proposing an ensemble meta-based tree model (EMT) classifier technique for predicting the student performance. The experimental results show that the EMT as the ensemble technique gained a high accuracy performance reaching 98.5% (or 0.985). In addition, the proposed EMT technique obtains a high performance, which is a superior result compared to the other techniques.

1. Introduction

With the growing expansion in large data warehouse, the necessity for analysing the data and extracting the useful information becomes a common question and a rich scholarly topic of examination for many researchers [1]. Data mining techniques are used as analytical tools to extract the hidden knowledge in a form of models from such data warehouses [2, 3]. Deploying it, there are many application areas that adapt data mining techniques in their systems such as finance, marketing, economy, telecommunication, medicine, healthcare, and student performance applications [4]. However, due to the importance of predicting the performance of the students and digitizing the systems of the universities, academic institutes construct a large volume of data pertaining to students using computerized forms [5, 6]. It becomes essential for these academic institutes to transfer the huge amount of data and to manipulate it into useful knowledge [7]. This aims to help instructors, employees, and authorities to facilitate their missions through analysing data. Therefore, the knowledge plays a vital role in pushing the wheel of education further by providing or extracting significant knowledge [5] by using data mining methods [7].

The educational data mining (EDM) is a mature field that concerns with the process of applying data mining techniques such as classification or clustering on educational data, to better understanding the learning process and students’ achievements [8, 9]. This is conducted by finding the knowledge that is interpretable and understandable by humans [10]. In particular, the classification techniques are used to analyse educational data in order to predict students’ academic performance. These results enhance the overall university performance and provide a more successful learning environment [10].

The problem of predicting student performance, therefore, is formulated as input features such as GradeID, topic, and raised hands along with related labels. The features represent an analysis of the student historical data, whereas the label represents the actual performance. The observed records represent the training set that a classifier is derived. In particular, the formulation is as follows: assume that there is a set of students s ∈ (S) and their historical academic performance records r ∈ (R) that are gathered during specific semesters. The relevant features are extracted from such R for each student φ (F1, F2, F3, …, Fn), and they are related with an academic performance P (s) in order to build a training set Ts. A classifier is built by mapping the set of features into their corresponding label value using the training set Ts:

The academic institutes aim to enhance the performance of the students by monitoring their historical records in different semesters. The process consumes time and efforts to gather such information for making decision especially for many students. Many works have revealed different approaches that help in providing a decision for improving student performance. These approaches depend on building classifier models that also determine the performance of unknown students using fewer features. Although these techniques have an effect in improving the prediction performance, they do not provide a comprehensive and unified approach that accurately determines the performance. Therefore, the contributions comprise three folds which include three research questions as follows:(1)Studying different features extracted from student historical academic records and analysing the correlation and relationship between features and their labels (or student performance). These correlations determine the importance of the features on predicting the student performance.RQ1: what is the relationship between the extracted features and the student performance in order to build prediction systems?(2)Several ML techniques from different theoretical families are examined on predicting the student performance that indicate how diversity is using these techniques and to what extent they help to improve the performance.RQ2: how effective is employing various ML techniques in predicting the academic performance of the students and how to exploit the techniques to improve the performance of new students?(3)A proposed approach combines two classification techniques. The approach follows the hypothesis that the performance results of ensemble classifiers built from different techniques are better than using a simple technique of classification models. A proposed model called EMT prediction model is built along with a thorough analysis of its performance on the academic student training set.RQ3: what is the prediction performance of the EMT technique for estimating student performance?

The paper is organized as follows: Section 2 discusses related works and clarifies the weaknesses that derive the assumption in the proposed techniques. Section 3 presents the methodology for studying the extracted features, examining the conducted ML techniques and building a cluster-based prediction model. Section 4 investigates the experimental settings, results, and discussions, followed by conclusions and future works in Section 5.

Several studies have emerged to enhance student performance [5, 11, 12]. These either focus on the effect of variant factors on student performance or build an appropriate classification model to predict the future unknown performance. Therefore, this section discusses the two aspects with a plethora of research.

2.1. The Student Performance Factors

In literature, there are many features that were studied which influence the final prediction of students’ results. Table 1 summarizes the most features that were used to predict student performance. Such features were extracted from students’ historical records and have been used as input for constructing prediction models.

The researchers investigated the factors that influence the academic performance of the students, especially those of low academic performance in educational institutions as shown in Table 1. The factors that are gathered during the academic life of the students include CGPA (grade point average), internal assessment, students’ demographic information, external assessments, extracurricular activities, high school background, social interaction network, and psychometric. The most frequent factors from the reviewed studies are CGPA, internal assessment, and students’ demographic information such as gender, age, and salary income. In respect to the other factors, the external assessments also used to predict student performance that includes the behavior of students outside the class. The other important factors that have not been used frequently in previous studies are the extracurricular activities, high school background, social interaction network, and psychometric.

2.2. Prediction Methods

The essential aspect having more attention in the area of student performance prediction is the prediction models that are used for classifying the student performance. There are variants of ML techniques categorized into a set of families where each family reflects a specific theoretical idea. As discussed in the Introduction, the EDM field has studied different ML techniques to determine these techniques obtaining a high accuracy to predict the future performance of students [7]. Table 2 summarizes the most used classification algorithms, which were used to predict student performance in educational dataset; several works have been investigated to find the best algorithms that can be used to predict the future student performance.

Romero et al. [28] found that sequential minimal optimization (SMO), NaiveBayesSimple and BayesNet techniques had the highest value to accuracy and F-measure. Jishan et al. [15] indicated that neural network and Naive Bayes classification models using SMOTE technique in an imbalance dataset had an accuracy of 75%, while the Naive Bayes and neural network models produced almost similar accuracy level when the discretization method is applied. By applying the decision tree, the result indicated that C4.5, CART, and ID3 had the best classifiers for the prediction performance of students [31].

A study was conducted in [20] to compare the C4.5 with multilayer perceptron and Naive Bayes. The results indicated that the Naive Bayes method has a good prediction accuracy among the other classifiers. Similarly, the decision tree models (REPTree) had a highest prediction accuracy [17].

Many researchers in their recent papers, as shown in Table 2, have made significant results in their understanding of appropriate factors that influence the students’ achievement. Also, they have shown that all classifiers can predict student performance with reasonable results. Moreover, selecting the most fitting technique is not the same in the previous studies since they have different datasets in different contexts as shown in Table 1.

3. Ensemble Meta-Based Tree Model

The performance of students in academic institutions provides an indication of how much effort such institutions must maintain to improve the low or even medium performance. The importance of employing ML techniques, exploiting the historical data of the students in order to predict unknown or future performance, has witnessed an intuitive attention motivating us to construct a model for predicting the unknown labels of future instances. The proposed method dubbed EMT is an ensemble technique which combines the best-selected techniques as the final prediction model. A proposed methodology is followed to show how the technique is constructed in an accurate manner. The methodology comprises three folds which include preprocessing phase, construction phase, and the evaluation phase as shown in Figure 1. These phases help to increase the EMT model performance to predict the student performance. Each phase is discussed as follows.

3.1. Preprocessing Phase

The constructed model of performance prediction depends on historical records of the students as a training set. Thus, gathering such records reveals utmost importance in order to increase the accuracy of the constructed model. The process of aggregating the records is conducted in a variety of techniques such as student interviewing, questionnaires, online student evaluation, etc. In most cases, the historical records are organized in unstructured forms, such as documents, which complicate the task of extracting the patterns from them. Therefore, converting the documents into an appropriate structural form simplifies the job of ML in building the predictive models. The training set maintains a set of features selected from the documents (or students' records) of 480 records along with student performance of low level (≤69%), middle level (70% to less than or equal to 89%), and high level (90% to less than or equal to 100%) [10]. The dataset includes sixteen attributes from a registration office that has information of preregistration students. These attributes are gathered during an academic semester. To clean the dataset, the outliers such as inconsistent and missing values are removed. The dataset is reduced to 13 attributes and 400 records, and the results are shown in Table 3 (for more descriptions, see Appendix A).

3.2. Construction Phase

The training set is ready to be used for constructing the predictive models of student performance. There is a set of ML techniques from different families that could be used to construct the predictive models. These families differentiate from each other in the theoretical process used for building the model. These models can be deployed in systems for classifying future instances. Machine learning techniques construct a hypothesis from a space set of hypotheses using in the training set. Each hypothesis represented as a mathematical model (or pattern) to map the input instances into output labels. In machine learning, there are models classified into groups of the same theory but with slight differences in technical parts. In this paper, a set of learning techniques selected from different families are used. The classifiers in terms of family perspective are summarized as follows which are also used in the related works [32]:(1)Bayes: This is a family that relies on the probability theory for building a classifier. This family builds a classifier on a form of rule-based or network models reflecting the probability of classes given specific features. In particular, the model uses the prior probability, the probabilities of observing various data given the hypothesis, and the observed data itself in the construction process such as NaiveBayes and BayesNet.(2)Functions: The classifiers in this family aim to construct a model of function. The function consists of input features and output labels. There are many techniques to map the inputs to the outputs, such as the neural network that uses the feed-forward and backpropagation methods for updating the network, in order to enhance the prediction by reducing the error of loss function. The other techniques also include logistics (regression), support vector machine (i.e., hyperplanes), etc.(3)Lazy: The classifier models use the training set to find the output label from the most similar common labels in the training set. For instance, the future instance features are compared with the training set features, where the labels of the most similar instances are used to predict the value of that future instance. There are a set of techniques in this family such as IBk, KStar, and LWL.(4)Meta: This family suggests that using a set of ensemble techniques increases the performance of the prediction models. Notably, the ensemble classifier is built by combining a set of weak learning classifiers, either by using voting or weighting techniques of their results. Each classifier is built by using a random sample of the training set. There are many techniques in this family such as AdaBoost, Bagging, and LogitBoost.(5)Trees: This family builds classifiers in a form of trees where the nodes represent the attributes except the leave nodes represent the labels, while the arcs (or edges) represent the values of that attribute. The attribute is selected using different methods, which depend on entropy information that reflects the importance of attribute with low chaos. There are a set of techniques that differentiate with each other in selection methods such as decision tree (or J48), LMT, and random forest.

The EMT is an ensemble bagging technique that combines two methods from boosting and the best ML technique of whole families into a predictive model as shown in Figure 1. The boosting method is the most popular and powerful data mining algorithm that obtains a high prediction performance by combining several weak learners. It is an iterative process, in which a classifier is generated as a weak learner at each iteration. Then, an ensemble boosting classifier combines these weak learners with a coefficient weight for each learner into a strong prediction model with high accuracy performance. In this paper, the AdaboostM1 method is used to train more than one learner to solve the problem of student prediction performance. The best prediction model from these learners is selected as one of the EMT two methods. On the other hand, the bagging methods evaluate the output of the weak learners using two techniques. The techniques involve voting and averaging methods. In the voting method, the most popular class of weak learners is used as a final output result of the bagging learner, while the averaging method estimates the average values of the weak learners as a final output.

In summary, the EMT technique combines the best classifier of the whole families on the same training set with the best learners of AdaboostM1 method as a bagging technique using the voting method between the two techniques.

3.3. Evaluation Phase

In order to evaluate the EMT technique to generalise it for future instances, the 10-fold cross-validation technique is used. The evaluation model part in Figure 1 demonstrates such technique. As shown, every 10 parts hold the same number of instances divided in a random way. At each iteration, one part is used as a validation set, while the other parts are combined as a training set. As a result, the training set is used for building the classifier, whereas the validation set is used to evaluate the classifier by predicting the class of its instances. Herein, a set of evaluation metrics is also used to examine the performance of the constructed classifier model at each iteration. However, the evaluation values of all iterations are averaged into a final output evaluation metric value which is used as an estimator of the classifier performance. In the evaluation metric perspective, a set of evaluation metrics are used to evaluate the EMT technique, which are accuracy, F-measure, and ROC metrics. The accuracy metric represents the number of instances that are classified correctly over all instances using the three classes, which are low, medium, and high performance. The F-measure is an evaluation metric that combines and obtains the benefits of two evaluation metrics, which are Precision and Recall. The Precision (also called positive predictive value) is the percentage of relevant instances of the retrieved ones, while Recall (also known as sensitivity) is the percentage of retrieved and relevant instances over all the total number of relevant ones. The ROC area (or curve) is used to examine how the classifier can differentiate between positive and negative instances and identify a threshold for separating them:

TP refers to true positive, which is the number of instances that are correctly labelled as belonging to the positive class. TN refers to true negative, which is the number of negative instances that are correctly classified as negative. FP refers to false positive, which is the number of positives that are incorrectly classified as negative. FN refers to false negative, which is the number of negative instances that are incorrectly classified as positive.

4. Results and Discussion

The experimental results are conducted to cover the research questions that ensure that the proposed technique for predicting the student performance gains high effective performance. The research questions clarify three important issues, which are (1) features analysis, (2) analysing the prediction models, and (3) proposing out a technique called ensemble meta-based tree model (EMT model).

4.1. Features Analysis

In order to evaluate the features and examine their effect on the student academic performance (i.e., class labels), the Pearson correlation method is used as an analytical technique [33]. Figure 2 shows the results of conducting the Pearson correlation method to predict the final grades of students. The attribute of visited resources occupies 0.436, the factor of student absence days gets 0.399, raised hands obtains 0.376, announcement view achieves 0.33, the factor of relation gets 0.312, the attribute of parent answering survey obtains 0.272, the factor of parent-school satisfaction gets 0.196, the attribute of gender scores 0.164, the attribute of discussion occupies 0.160, stage attribute scores 0.077, semester factor occupies 0.075, and finally topic factor gets 0.06. The results indicate that the predictor variables have a statistically significant correlation or ( value < 0.01) with a class value.

The results, as shown in Figure 2, reflect strong positive correlations between all independent attributes and dependent attribute (or class). The correlation ranged between 0.06 as the lowest value for the topic and 0.436 as the highest value for visiting resources. It is worth mentioning that the factor of visiting resources highly affects the final results of students, which proves the fact that it highly influences their final results and it can raise them to get middle or high results instead of getting the low results. Therefore, academic institutions must focus on this factor by facilitating this service and encouraging their students to visit available resources. On the other hand, the topic indicates low correlation due to the various contents of courses that depend on the syllabus and the methods the instructors use during the class. These results might help instructors, administrators, and policymakers to make a better decision regarding the learning process. Finally, these results answer affirmatively RQ1 that each factor behaves in a different way related the prediction class.

4.2. Prediction Model Analysis

Many researchers in their recent papers have made significant results in understanding the appropriate factors that influence the students’ achievement. The question is how much effect these factors would be on the classification methods. In this section, 47 classification techniques are conducted using the dataset. The factors of this dataset are the extracted features along with the labels that represent the student performance. In order to evaluate the classification techniques, the evaluation phase is followed as discussed in Section 3. As shown in Table 4, the classifiers obtain consistent results for both F-measure and accuracy measures. These results show that the extracted features have a role to evolve the prediction models from students’ historical records with reasonable accuracy.

Regarding classification outcomes, Table 4 shows the best algorithms of different families where each family behaves in different ways to extract patterns from the training set. Each family of ML techniques follows a theoretical framework for building the required model. As such, it is normal that the constructed model behaves in different ways for identifying the target classes (i.e., low, medium, and high) and provides diverse results. Each family has a model with best result which includes PART (rules), A2DE (Bayes), multilayer perceptron (functions), LocalKnn (lazy), and J48 (trees) algorithms with 91.8%, 89.5%, 91%, 92.8%, and 94.3% accuracy, respectively (see Appendix B). The most effective family in predicting the student performance is the tree family compared to the other families with high value in accuracy and F-measure. In particular, the J48 algorithm outperforms the whole 46 classifiers from different families with consistent results after applying the 10 cross-validation method of 377 correctly classified instances of 400 instances, which is 94.3%. The consistent results of the tree family relate to the feature selection criteria that are used to select the relevant features for building the tree model. The most correlated feature is selected to be at the root of the tree such as visiting resources, whilst the other features are selected subsequently based on their effectiveness till the leaf nodes that represent the decision classes. The analysis of ML techniques and examining their effects to predict the student performance leads us to answer the research question RQ2. The question studies the ability of ML techniques in EDM deployed in the academic institution systems. Hence, it is better to find the most promising and superior technique in a specific school.

4.3. Ensemble Meta-Based Tree Model

The proposed EMT model essentially combines two consistent ML techniques into a voting bagging technique. The idea comes from aggregating the most effective algorithms into one technique that would enhance the prediction models. The EMT technique performs as an ensemble method to get more accurate prediction results than those from traditional learning methods that focus on one algorithm learner [34, 35].

Table 5 shows the results of using the best classifiers of different families alone and as functions in boosting and bagging methods. As shown, the classifiers in the boosting method reveal a high accuracy in comparison with the bagging method using accuracy, F-measure, and ROC area metrics. In particular, the J48 classifier obtains consistent results when it is used as a function in boosting method dubbed Adaboost_J48, which is increased by 4.1% from 0.943 to 0.983. Consequently, using the J48 classifier as a function in boosting method is the first function of the EMT model to be combined with the best classifier overall the other classification models.

Combining multiple effective classifiers instead of using a single classifier into an accurate prediction model would obtain a high prediction performance of the students [35]. Thereupon, the EMT model ensembles a set of more effective classifier models that are derived from aggregating the first function of EMT model with the best tree-based classifier using voting method of bagging technique as shown in Table 6. The results are ranged between 0.95 for BFTree voting with Adaboost_J48 and 0.985 when ensemble NBTree voting with Adaboost_J48. The NBTree algorithm shows the robustness of integrating the Naïve Bayes technique as a decision tree on increasing the performance metric.

Figure 3 describes the results regarding the performance of students at the academic institution. In Figure 3(a) and based on the results, the accuracy score of NBTree classifier is 0.925. The results in Figure 3(b) represent the output of the J48 algorithm. Because of the increase in the numbers of accurate results of this figure, the total accuracy for the second figure reached to 0.943. Figure 3(c) represents the confusion matrix and the performance of the EMT model. The accuracy of this figure increases to 0.985. This indicates an important fact regarding the improvement of the actual results and the tangible improvement in the performance of students. The proposed figure of this research points out to the development in the accuracy of student performance and the prediction, which is adopted after the boosting of the algorithms J48 and NBTree together. This, in turn, leads to the highest accuracy results to 0.985, unlike the observed accuracy results which are recorded after the use of the J48 algorithm or boosting method alone.

The question here is around how valuable is it to use the EMT model as a promising method for predicting the performance of students. The answer is proven, as shown in Table 6 and Figure 3, which particularly leads to confirm RQ3. The main purpose of the work here is to propose more effective and accurate prediction models to enhance the quality of educational institutions by deploying the models. This would be used to predict potential students who achieve low performance and understand why they achieved such results. Thus, the EMT prediction model and its results would help instructors, administrators, and policymakers to make a better decision especially for those who are expected to achieve low-level performance [5, 36]. The decision aims to reduce the number of expected potential students who might achieve low-level performance and develop the learning system by adding new tools that improve the students’ academic performance [5].

5. Conclusion and Future Work

The educational data mining (EDM) uses analytical tools from the data mining field that are exploited to explore the unique types of the dataset in the academic field. The tools convert educational system data that are not readable by a human into interpretable and valuable information that may possibly have a big impact on the educational research. In this paper, the EDM field is used to predict students’ academic performance. The contributions are maintained into three folds which include (i) examining a set of features that reflect the performance of students using Pearson correlation, (ii) evaluating a set of learning models using ML techniques where these techniques use specifically a dataset of 400 instances collected in two semesters to be used in the construction phase (the dataset includes thirteen attributes, from a registration office, having preregistration students’ information along with student performance as low, middle, and high), and (iii) proposing a prediction model for evaluating student performance on the same dataset, called EMT model. The EMT model is constructed by combining the most effective techniques studied from 47 learning techniques.

However, the results show strong correlations between the independent features and the final achievement of students ranging between 0.436 and 0.06, which leads to answer RQ1. Based on the 10-fold cross-validation method, a comparison between different ML techniques is conducted. Herein, a set of evaluation metrics are used to evaluate these techniques, which are accuracy, F-measure, and ROC metrics. The results found that PART, A2DE, multilayer perceptron, LocalKnn, and J48 algorithms have accuracy values of 91.8%, 89.5%, 91%, 92.8%, and 94.3%, respectively, which were selected from different five families of classifiers which ensure the validity of RQ2. Finally, based on the previous ML results, the J48 was selected to achieve the main goal which is RQ3. Moreover, experimental results were conducted to enhance the results on the best-selected classifier J48 by applying the ensemble method and voting the results with the tree family algorithms. The results have shown a significant improvement using the proposed EMT model algorithm. It has been proved that the proposed algorithm can achieve an accuracy of up to 98.5%; as a result, the RQ3 is consequently achieved.

The main purpose of the work is to improve the quality of education institutions. The improvements are achieved by deploying the proposed predictive model that would be used to predict the performance of the students for especially those having low performance and understanding the reasons behind such results. As a result, the model provides the academic institute, including its management and teachers, with an accurate evaluation of their students at an early stage of the learning process to prevent them from getting a low result. Also, it helps to better allocate both resources and staff that result in increasing the effectiveness in education development.

In future work, the research suggests using more features such as examining of how the use of social media or babysitting would affect the performance of the students. In addition, extra experiments could be conducted using other data mining techniques such as clustering.

Appendix

A. The Description of the Dataset

In this study, 400 students’ records with 13 features (or attributes) were used as a dataset. The dataset as shown in Figure 4 illustrates the definitions of features and their distributions with range values and types.

B. The Confusion Matrix Results for the Best Five Classifiers

In this part, confusion matrix is adopted as a tool to  analyse  the best  five  classifiers of each family by comparing the actual and the target outputs.  The results shown in Figure 5 indicate that the best classifiers are PART, A2DE, multilayer perceptron, LocalKnn, and J48 algorithms with correctly classified instances as 367, 358, 364, 371, and 377, respectively.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.