Developing Multiagent E-Learning System-Based Machine Learning and Feature Selection Techniques

Hessen, Shrouk H.; Abdul-kader, Hatem M.; Khedr, Ayman E.; Salem, Rashed K.

doi:https://doi.org/10.1155/2022/2941840

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Related Works Results Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Artificial Intelligence and Machine Learning-Driven Decision-Making

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 2941840 | https://doi.org/10.1155/2022/2941840

Developing Multiagent E-Learning System-Based Machine Learning and Feature Selection Techniques

Shrouk H. Hessen,^1,2Hatem M. Abdul-kader,¹Ayman E. Khedr,³and Rashed K. Salem¹

Academic Editor: Ahmed Mostafa Khalil

Received16 Dec 2021

Accepted03 Jan 2022

Published30 Jan 2022

Abstract

Recently, artificial intelligence (AI) domain increased to contain finance, education, health, mining, and education. Artificial intelligence controls the performance of systems that use new technologies, especially in the education environment. The multiagent system (MAS) is considered an intelligent system to facilitate the e-learning process in the educational environment. MAS is used to make interaction easily among agents, which supports the use of feature selection. The feature selection methods are used to select the important and relevant features from the database that could help machine learning algorithms produce high performance. This paper aims to propose an effective and suitable system for multiagent-based machine learning algorithms and feature selection methods to enhance the e-learning process in the educational environment which predicts pass or fail results. The univariate and Extra Trees feature selection methods are used to select the essential attributes from the database. Five machine learning algorithms named Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and K-nearest neighbors algorithm (KNN) are applied to all features and selected features. The results showed that the learning algorithm that has been measured by the Extra Trees method has achieved the highest performance depending on the evaluation of cross-validation and testing.

1. Introduction

During the last two years, global disasters have occurred, so all people are forced to use technologies to get their services remotely [1]. Technologies could allow users to achieve the appropriate task at a low cost and save time. Artificial intelligence (AI) is a trending topic in current days, which allows machine learning to be implemented for efficiency and performance [2]. Education, health, industry, and finance use artificial intelligence to develop their fields. Rapidly increasing education environment needs to use machine learning techniques which are considered one of the faces for AI [3].

Enhancing learning systems, especially e-learning systems in the educational process, has become necessary for the educational environment. Using an intelligent system is our target to organize the e-learning process. Continuous changes in the e-learning process led to emerging suitable techniques to deal with the requirements of students [4–6]. The e-learning system could allow students to use the benefits of this application anywhere, anytime. When using the e-learning system, there is a need to be supported by the multiagent model to cover the shortage of educational environment. Since many attributes are used in the e-learning process, multiagent is the best solution for the e-learning system. Agents could interact with others in the same environment, so the multiagent system could allow integration between agents [7, 8]. The multiagent system could allow e-learning attributes to interact and discover their relationship. In the e-learning system, students could use many features to enhance their performance [9]. Proposing the multiagent system could assist e-learning systems in improving various tasks for students. Using artificial intelligence systems is the way to enhance the performance of students by using feature selection methods [10, 11].

Machine learning algorithms are used to identify and predict the data to produce the best solution for a decision [3]. Machine learning algorithms are playing an essential role in different fields [12, 13] and especially in the field of education [14, 15]. Machine learning algorithms play an essential role in the educational process and feature selection algorithms. Feature selection algorithms could select only relevant features for high prediction by using various algorithms [16, 17]. Many feature selection algorithms could be used for efficiency, revealing irrelevant features.

This paper proposed an education system using multiagents to study interactive agents’ effects to enhance e-learning. We integrated different agents: course, student, and different activities, and we applied different feature selection methods to select the most attributes that are playing an important role in enhancing the e-learning process. We applied five machine learning algorithms on selected features and evaluated ML algorithms’ performance using different measurement methods to enhance the effect of the feature selection methods on the performance of the educational process.

In the next section, the literature review will illustrate the related work of predicting performance using machine learning algorithms and the features that affect the prediction of the education system. Section 3 displays the main steps of the proposed system. Section 4 discusses the results of applying ML algorithms on the selected features. Section 5 provides a summary of the paper.

Machine learning (ML) is an implementation part of artificial intelligence (AI) that enables the machine to learn from data to complete the task efficiently. It is considered a backbone of artificial intelligence approaches that are used to develop the prediction to enhance performance [2, 18].

Feature selection (FS) is considered important data before deploying machine learning algorithms [16, 17]. The feature selection could select only relevant and essential features from the data and ignore the redundant data [19]. Many researchers have used feature selection methods and machine learning algorithms to improve the educational process. For example, in [20], the authors proposed a learning system that implements a fuzzy methodology to detect the failure of students. The activity of students, subjects, and their background in education are the factors that affect performance. They used multicriteria of the fuzzy algorithm to get the rank of students which predict the score. The dataset consisted of 3 institutions that contained 131 students with 22 attributes.

In [21], the authors used machine learning algorithms to predict the performance of students in the faculty of Computer Science and Information Technology. They proposed supervised machine learning algorithms to predict the results of the examination so, they work under two steps. The preprocessing step is to prepare the data, clean it, and then use the machine learning algorithms to predict performance. They used several supervised algorithms, and the results proved that the logistic regression classifier gets the best results for 498 students.

In [22], the authors proposed a decision tree algorithm compared with the other three algorithms. They used Weka tools and test the data collected to predict failure and success. They tested the features that affect the accuracy based on the model results on gaining relevant features. The results of features selected five relevant features from ten total features. After using several popular machine learning algorithms (J48, Random Tree, and RepTree), they recommended that the decision tree algorithm is the best solution for high accuracy.

In [19], the authors proposed their study using feature selection in supervised machine learning algorithms for higher education. They used Weka mining tool in their experiments which is the most popular tool for mining. The dataset consists of 11 features that selected out 45 features to predict the student’s residence country which was trained and tested with different methods. They used K-Fold, Hold-Out, and Leave One Out, and then the results found that Leave One Out obtained high accuracy with Random Forest, and GRAE algorithm results enhanced the accuracy and obtained the highest accuracy. In [23], the authors proposed a Generalized Feature Selection (GeFeS) method-based machine learning genetic algorithm to choose a subset of features that were unique and important. In this study, the method used an efficient and fast prediction method to optimize the performance for high accuracy and minimize the cost. Genetic Algorithm (GA) with the sequence of operators had been used to be more relevant and intelligent. Operators in GA are used to increase the capability which allows dealing with a variant dataset (small and large scale). This method succeeded in increasing the accuracy and evaluating F-measure, and then, the results were compared with other feature selection methods. The proposed algorithm could illustrate high performance compared with previous methods that were used before considering the same datasets. In [24], two feature selection methods are combined (CHI and MI) to measure the performance, which could evaluate the scores of features. The new features’ scores had been normalized then, measuring the performance of the student in the education process as it considered important agent from the multiagent that were found in the educational sector. This study presented comparison results of using different predictive models and illustrated the accuracy for each model to develop the performance.

3. Methodology

The proposed multiagent framework-based e-learning educational system is shown in Figure 1. It consists of the following steps: data collection, preprocessing dataset, integrating dataset, feature extraction methods, splitting dataset, training and optimizing ML algorithms, and evaluating ML algorithms. We will describe each step as follows.

3.1. Data Collection

We used Open University Learning Analytics dataset [25] to make our experiment. This dataset contains seven multiagent described as CSV files, and each file contains a table with several features:(1)Courses: the courses that students should be studied per semester(2)Assessment of students: the results of all assessments should be submitted after being completed by students(3)Information about students: the student’s basic information(4)Registration of students: the date that students are allowed to register for the course(5)Virtual learning environment of students: the interaction that belongs to students on each course could be recorded(6)Virtual learning environment: each material of courses could be found in different types and styles of learning; then, each student could access them and the activity of students could be recorded.(7)Assessment: the evaluation of students during the semester which contains the results of all assignments that had been submitted

The dataset includes many learning and activity types that could be applied for students in each course. The collected dataset evaluates the interaction that belongs to 32,593 students that interacted with 19 activity types and their styles in 22 courses. In our work, we will study the impact of using four agents that will be integrated named as follows: courses, students’ information, virtual learning environment of students, and their VLE. These integrated agents will be illustrated to improve the learning system in the educational process. The following sections will describe the preprocessing steps on the dataset and propose a developed multiagent e-learning system that contains integrated four agents. Table 1 describes each agent and its attributes.

3.2. Preprocessing Dataset

This paper tried to solve the problem by converting it into a binary classification problem. The student’s info table includes a class label and the value of the class label contains four values: pass, fail, withdrawn, and distinction. The distinction is converted into a pass value and the withdrawn is converted into a fail value. We integrated the student’s vle table with the VLE table into one table which is called student’s learning style and activity; the names of learning and activities in the VLE table are extracted and added as attributes in the student’s vle table and filling values of attributes by the number of total clicks for each student in a course.

3.3. Integrating Dataset

The integrated tables have been combined by using left join. The student’s learning style and activity table is integrated with the student’s info table which contains the following attributes “id student,” “code module,” and “code presentation” by implementing left join.

3.4. Feature Selection Methods

The key advantages of employing feature selection techniques are used to identify and select the most essential and most ranked features from the dataset. Machine learning algorithm-based feature selection methods are used to achieve the best performance. The two methods are used, namely, univariate and Extra Trees feature selection methods:(i)Univariate feature selection is used to select the best features from all features depending on univariate statistical tests. In this method, each feature will have its own rank and score, and then, it is easy to select the high scored features considered as the best features.(ii)Extra Trees extended its function from the original set of the data sample. In the test set, each one of the test nodes with each one of the trees is supported with a number of random features depending on each one of the decision trees. Each decision tree should select the relevant feature-based mathematical algorithm [26, 27].

3.5. Splitting Dataset

The integrated dataset is partitioned into a training set of data and a testing set of data. The training set is used to optimize ML algorithms by implementing grid search and stratified cross-validation. Testing set is used to evaluate ML algorithms performance by four measurement methods: accuracy (A), precision (P), recall (R), and F-measure (F). The results of cross-validation and testing are registered for each ML algorithm.

3.6. Training and Optimizing ML Algorithms

Grid search with cross-validation is used to optimize ML algorithms and enhance the performance of algorithms. Grid search is a technique used for determining the best hyperparameters for ML algorithms in order to achieve the best results. CV splits the dataset into k subsets so that ML algorithms can be trained on k-1 subsets (the training set) andthe testing subsetis used to test machine learning algorithms. ML algorithms are used to develop a multiagent e-learning system. These algorithms are(1)Naive Bayes (NB) classifier is considered one of the classification supervised machine learning approaches assuming that there are two independent features. NB estimates relevant parameters, so it is considered one of the high classification techniques for relevant output [28].(2)Random Forest (RF) is a machine learning model used for classification problems that are used because of its flexibility. It could use to operate many decision trees at the first step of preprocessing data (training set step) and then calculate the average of prediction of the trees. Random Forest was used to estimate the accuracy in exploratory data analysis (EDA) step which could deal with large dataset. It is used as an effective way to deal with enormous features and retrieve estimated feature-based algorithm [29, 30].(3)Decision Tree (DT) classification supervised algorithm is the most popular algorithm for the machine learning algorithm. It has branches with nodes for constructing graphs to present internal node as test feature communicated in every leaf as result as gaining parent node, and then leaf could be assigned the label of the class. DT is classified as a top-down approach that starts from the root point of the tree. The branch is submitted as significance for its node to decide the label [28, 31]. Decision Tree Algorithm contains a root which splits into branches to make the prediction (decision) [28]. This algorithm is one of the most common algorithms that could address the problem in a process that identifies the solution accurately and fast.(4)Logistic Regression (LR) is one of the regression algorithms that play a part of prediction role and could develop the relationship among dependent variables and independent variables [32].

3.7. Evaluating ML Algorithms

There are many standard metrics used to evaluate ML algorithms called accuracy (A), precision (P), recall (R), and F-measure (F). True positive (TP), true negative (TF), false positive (FP), and false negative (FN) are defined as follows:

4. Experiments and Results

4.1. Experiment Setup

This paper’s experiments were run on Python 3. ML models were implemented using the sci-kit-learn package. ML algorithms are optimized using grid search with cross-validation. The dataset was partitioned into two parts: an 80% training set for optimizing models and registering cross-validation results and a 20% testing dataset (unseen data) for evaluating models and registering testing results. We conducted various experiments to study the effect of learning and activity types in the educational process using feature selection methods based on five ML algorithms: DT, KNN, NB, LR, and RF. First, feature selection methods have been applied to the database for determining the important features. Second, ML algorithms are used based on full features. Third, ML algorithms have been implemented on the top thirteen features that recorded the highest scores. Fourth, ML algorithms have been implemented as another experiment on the top six features that have the highest scores or rankings. The results of the cross-validation and testing have been recorded using accuracy (A), precision (P), recall (R), and F-measure (F).

4.2. Results of Applying Feature Selection Methods

In this section, we will describe the results of applying feature selection methods: univariate and Extra Trees on the database.

4.2.1. Univariate Feature Selection Method

Univariate assigns scores for each feature, and we selected the important and best features based on high scores. Table 2 shows the scores of all features of applying the univariate method on the dataset. We can see that the oucontent activity is registered that contains the first high score with 5494843.899. Forumng activity has registered the second high score with 3793119.894. Html activity has registered the lowest score at 1012.523433 for activities. Code presentation has registered the worst score at 0.377850061 for all features.

4.2.2. Extra Trees Feature Selection Method

Extra Trees assigns ranking for each feature, and we selected the best features based on high ranking. Figure 2 shows the ranking of all features of applying Extra Trees on the dataset. We can see that the homepage and quiz have the highest ranking at 12.5 and 12.24, respectively. The repeat activity has registered the lowest rank at 0.01. Resource, url, and code module have approximately the same rank at 6.78, 6.61, and 6.15, respectively.

4.3. Results of Applying ML Algorithms to Full Features

ML algorithms have been applied to full features, and the results of cross-validation and testing performance of applying ML algorithms have been recorded as shown in Table 3. In the cross-validation result, the RF has registered the highest performance (A = 88.14%, , R = 88.12%, and F = 88.15%), while NB has recorded the lowest performance (A = 69.79%, , R = 69.79%, and F = 68.89%). KNN has recorded the second-highest performance (A = 83.74%, , R = 83.74%, and F = 83.74%). In the testing result, RF has registered the highest performance (A = 86.88%, , R = 86.88%, and F = 86.89%), while NB has recorded the lowest performance (A = 69.44%, , R = 69.44%, and F = 68.51%). KNN has recorded the second-highest performance (A = 82.23%, , and R = 82.23%, F = 82.24%).

4.4. Results of Applying ML Algorithms to Thirteen Features

Two feature selection methods will be applied, thirteen features are selected because of their high ranking and scores. ML algorithms have been applied and the results of cross-validation and testing have been recorded.

4.4.1. Thirteen Selected Features by Univariate

The top thirteen features, oucontent, forumng, quiz, homepage, subpage, ouwiki, resource, url, oucollaborate, glossary, dataplus, questionnaire, and externalquiz, have been selected. ML algorithms have been applied to thirteen features, and the results of cross-validation and testing performance of applying ML algorithms have been recorded as shown in Table 4.

In the cross-validation result, the RF has registered the highest performance (A = 86.5%, , R = 86.48%, and F = 86.55%), while NB has recorded the lowest performance (A = 66.19%, , R = 66.19%, and F = 65.52%). KNN has recorded the second-highest performance (A = 83.63%, , R = 83.63%, and F = 83.63%). In the testing result, RF has registered the highest performance (A = 85.72%, , R = 85.72%, and F = 85.73%), while NB has recorded the lowest performance (A = 65.19%, , R = 65.19%, and F = 64.37%). KNN has recorded the second-highest performance A = 82.32%, , R = 82.32%, and F = 82.33%.

4.4.2. Thirteen Selected Features by Extra Trees

The top 13 features, homepage, quiz, oucontent, subpage, forumng, resource, url, code module, ouwiki, oucollaborate, page, questionnaire, and glossary with high ranking, have been selected. ML algorithms have been applied to 13 features, and the results of cross-validation and testing performance have been recorded as shown in Table 5.

In the cross-validation result, the RF has registered the highest performance (A = 87.6%, , R = 87.71%, and F = 87.7%), while NB has recorded the lowest performance (A = 68.82%, , R = 68.82%, and F = 67.98%). KNN has recorded the second-highest performance (A = 83.72%, , R = 83.72%, and F = 83.73%). In the testing performance, RF has registered the highest performance (A = 86.72%, , and R = 86.72%, F = 86.73%), while NB has recorded the lowest performance (A = 68.38%, , and R = 68.38%, and F = 67.45%).

4.5. Results of Applying ML Algorithms to Six Selected Features

After applying two feature selection methods, six features with high ranking or scores have been selected. ML algorithms have been applied and the results of cross-validation and testing have been recorded.

4.5.1. Six Selected Features by Univariate

The top six features, oucontent, forumng, quiz, homepage, subpage, and ouwiki with high scores, have been selected. ML algorithms have been applied to six features, and the results of cross-validation and testing performance of applying ML algorithms have been recorded as shown in Table 6.

In the cross-validation result, the RF has registered the highest performance (A = 84.41%, , R = 84.38%, and F = 84.44%), while NB has recorded the lowest performance (A = 65.36%, , R = 65.36%, and F = 64.65%). KNN has recorded the second-highest performance (A = 83.38%, , R = 65.36%, and F = 64.65%). In testing, RF has registered the highest performance (A = 84.41%, , R = 84.41%, and F = 84.42%), while NB has recorded the lowest performance (A = 82.05%, , R = 64.01%, and F = 64.37%). KNN is recorded as the second-highest performance A = 82.32%, , and R = 82.05%, F = 82.07%).

4.5.2. Six Selected Features by Extra Trees

The top six features, homepage, quiz, oucontent, subpage, forumng, and resource with high ranking, have been selected, and ML algorithms have been applied to six features, and the results of cross-validation and testing performance have been recorded as shown in Table 7.

In the cross-validation result, the RF has registered the highest performance (A = 85.5%, , R = 85.48%, and F = 85.54%), while NB has recorded the lowest performance (A = 66.14%, , R = 66.14%, and F = 65.66%). KNN has recorded the second-highest performance (A = 83.27%, , R = 83.27%, and F = 83.73%). In the testing result, RF has registered the highest performance (A = 85.06%, , R = 85.06%, and F = 85.08%), while NB has recorded the lowest performance (A = 65.52%, , R = 65.52%, and F = 65.06%). KNN has recorded the second-highest performance (A = 82.16%, , R = 82.16%, and F = 82.16%).

4.6. Discussion

Overall, the RF has achieved the highest performance for each experimental results. Figure 3 displays the best model (RF) for 13 selected features. As can be seen, the RF has achieved the best performance using Extra Trees for cross-validation and testing (A = 87.6%, , R = 87.71%, and F = 87.7%) and (A = 86.72%, , R = 86.72%, and F = 86.73%), respectively. Figure 4 displays the best model (RF) for 6 selected features. Moreover, the RF has achieved the highest performance using Extra Trees for cross-validation and testing (A = 85.5%, , R = 85.48%, and F = 85.54%) and (A = 85.06%, , R = 85.06%, and F = 85.08%), respectively.

5. Conclusion

This paper proposed a developed multiagent e-learning system to examine the interactions between agents that impact on e-learning process in the educational environment. The proposed framework briefly consists of the following steps: data collection, data preprocessing, integrating multiagents, feature extraction methods, and training and optimizing ML algorithms in addition to evaluating the performance of ML algorithms. In the integrating step, agents had been combined and used as tables named: course, student’s info, student’s vle, and VLE in one table using left join. In the feature selection steps, univariate and Extra Trees Classifier feature selection methods are used to select the most attributes that are relevant and play an important action in enhancing our multiagent framework. Different machine learning algorithms are used: DT, RF, LR, NB, and KNN, which are applied to select the high-ranked and relevant features. ML algorithms’ performance was evaluated using different measurement methods: ACC, PER, REC, and FM. The results showed that RF with 13 selected features by Extra Trees has achieved the highest performance for cross-validation (ACC = 87.6%, PRE = 88.05%, REC = 87.71%, and FM = 87.7%) and testing (ACC = 86.72%, PRE = 87.08%, REC = 86.72%, and FM = 86.73%).

Data Availability

Open University Learning Analytics dataset is downloaded from https://www.kaggle.com/rocki37/open-university-learning-analytics-dataset.

Conflicts of Interest

All authors declare that they have no conflicts of interest.

References

A. Tarik, H. Aissa, and F. Yousef, “Artificial intelligence and machine learning to predict student performance during the covid-19,” Procedia Computer Science, vol. 184, pp. 835–840, 2021.
View at: Publisher Site | Google Scholar
J. D. Pineda-Jaramillo, “A review of Machine Learning (ML) algorithms used for modeling travel mode choice,” Dyna, vol. 86, no. 211, pp. 32–41, 2019.
View at: Publisher Site | Google Scholar
W. Jin, “Research on machine learning and its algorithms and development,” Journal of Physics: Conference Series, vol. 1544, no. 1, Article ID 012003, 2020.
View at: Publisher Site | Google Scholar
Z. Gao and B. Wu, “Research on the innovation system of university production and education integration based on computer big data,” IOP Conference Series: Earth and Environmental Science, vol. 692, no. 2, Article ID 022025, 2021.
View at: Publisher Site | Google Scholar
C. Giuffra, R. Silveria, and R. A. Silveira, “A multi-agent system model to integrate virtual learning environments and intelligent tutoring systems,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 2, no. 1, p. 51, 2013.
View at: Publisher Site | Google Scholar
A. Khedr, S. Kholeif, and S. Hessen, “Enhanced cloud computing framework to improve the educational process in higher education: a case study of helwan university in Egypt,” International Journal of Computers & Technology, vol. 14, pp. 5814–5823, 2015.
View at: Google Scholar
S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learning: a survey,” Artificial Intelligence Review, pp. 1–49, 2021.
View at: Publisher Site | Google Scholar
A. Khedr and A. M. Idrees, “Adapting load balancing techniques for improving the performance of e-learning educational process,” Journal of Computers, vol. 12, pp. 250–257, 2017.
View at: Google Scholar
K. E. Ehimwenma and S. Krishnamoorthy, “Design and analysis of a multi-agent e-learning system using prometheus design tool,” 2020, https://arxiv.org/abs/2007.09645.
View at: Google Scholar
A. Bokolo, G. Peremoboere Maureen, and M. Abdul Majid, “A web deployed multi-agent based approach for student-lecturer appointment scheduling in institutions of higher learning,” Journal of Physics: Conference Series, vol. 1830, no. 1, Article ID 012007, 2021.
View at: Publisher Site | Google Scholar
A. O. Salau and S. Jain, “Feature extraction: a survey of the types, techniques, applications,” in Proceedings of the 2019 International Conference on Signal Processing and Communication (ICSC), pp. 158–164, Noida, India, March 2019.
View at: Publisher Site | Google Scholar
H. Liu, S. Li, H. Wang, Y. Huo, and J. Luo, “Adaptive synchronization for a class of uncertain fractional-order neural networks,” Entropy, vol. 17, no. 10, pp. 7185–7200, 2015.
View at: Publisher Site | Google Scholar
H. Liu, S. Li, G. Li, and H. Wang, “Adaptive controller design for a class of uncertain fractional-order nonlinear systems: an adaptive fuzzy approach,” International Journal of Fuzzy Systems, vol. 20, no. 2, pp. 366–379, 2018.
View at: Publisher Site | Google Scholar
V. B. Kolachalama and P. S. Garg, “Machine learning and medical education,” NPJ digital medicine, vol. 1, no. 1, pp. 54–63, 2018.
View at: Publisher Site | Google Scholar
W. Villegas-Ch, M. Román-Cañizares, and X. Palacios-Pacheco, “Improvement of an online education model with the integration of machine learning and data analysis in an lms,” Applied Sciences, vol. 10, no. 15, p. 5371, 2020.
View at: Publisher Site | Google Scholar
B. Albreiki, N. Zaki, and H. Alashwal, “A systematic literature review of student' performance prediction using machine learning techniques,” Education Sciences, vol. 11, no. 9, p. 552, 2021.
View at: Publisher Site | Google Scholar
W. K. Mutlag, S. K. Ali, Z. M. Aydam, and B. H. Taher, “Feature extraction methods: a review,” Journal of Physics: Conference Series, vol. 1591, no. 1, Article ID 012028, 2020.
View at: Publisher Site | Google Scholar
M. Mohammed, M. Khan, and E. Bashie, Machine Learning: Algorithms and Applications, CRC Press, Boca Raton, FL, USA, 2016.
C. Verma, V. Stoffová, and Z. Illés, “Prediction of residence country of student towards information, communication and mobile technology for real-time: preliminary results,” Procedia Computer Science, vol. 167, pp. 224–234, 2020.
View at: Publisher Site | Google Scholar
M. Marsigit, H. Retnawati, E. Apino et al., “Constructing mathematical concepts through external representations utilizing technology: an implementation in irt course,” TEM Journal, vol. 9, no. 1, pp. 317–326, 2020.
View at: Google Scholar
A. S. Hashim, W. A. Awadh, and A. K. Hamoud, “Student performance prediction model based on supervised machine learning algorithms,” IOP Conference Series: Materials Science and Engineering, vol. 928, no. 3, Article ID 032019, 2020.
View at: Publisher Site | Google Scholar
A. K. Hamoud, A. S. Hashim, and W. A. Awadh, “Predicting student performance in higher education institutions using decision tree analysis,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 2, pp. 26–31, 2018.
View at: Publisher Site | Google Scholar
G. Sahebi, P. Movahedi, M. Ebrahimi, T. Pahikkala, J. Plosila, and H. Tenhunen, “GeFeS: a generalized wrapper feature selection approach for optimizing classification performance,” Computers in Biology and Medicine, vol. 125, Article ID 103974, 2020.
View at: Publisher Site | Google Scholar
P. Sokkhey and T. Okazaki, “Study on dominant factor for academic performance prediction using feature selection methods,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 8, 2020.
View at: Publisher Site | Google Scholar
J. Kuzilek, M. Hlosta, and Z. Zdrahal, “Open university learning analytics dataset,” Scientific Data, vol. 4, no. 1, pp. 1–8, 2017.
View at: Publisher Site | Google Scholar
D. Baby, S. J. Devaraj, J. Hemanth, and A. R. Mm, “Leukocyte classification based on feature selection using extra trees classifier: a transfer learning approach,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 29, no. SI-1, pp. 2742–2757, 2021.
View at: Publisher Site | Google Scholar
F. Budiman, “Svm-rbf parameters testing optimization using cross validation and grid search to improve multiclass classification,” Scientific Visualization, vol. 11, no. 1, pp. 80–90, 2019.
View at: Publisher Site | Google Scholar
S. García, J. Luengo, and F. Herrera, “Feature selection,” in Data Preprocessing in Data Mining, pp. 163–193, Springer, Heidelberg, Germany, 2015.
View at: Publisher Site | Google Scholar
M. Manessa, K. Setiawan, M. Haidar et al., “Optimization of the random forest algorithm for multispectral derived bathymetry,” International Journal of Geoinformatics, vol. 16, no. 3, pp. 1–6, 2020.
View at: Google Scholar
N. Mohapatra, K. Shreya, and A. Chinmay, “Optimization of the random forest algorithm,” in Advances in Data Science and Management, pp. 201–208, Springer, Heidelberg, Germany, 2020.
View at: Publisher Site | Google Scholar
J. Cai, J. Luo, S. Wang, and S. Yang, “Feature selection in machine learning: a new perspective,” Neurocomputing, vol. 300, pp. 70–79, 2018.
View at: Publisher Site | Google Scholar
G. Sumalatha and S. Archana, “A study on early prevention and detection of breast cancer using data mining techniques,” International Journal of Innovative Research in Computer and Communication Engineering, vol. 5, no. 6, pp. 11045–11050, 2017.
View at: Google Scholar

Copyright

Copyright © 2022 Shrouk H. Hessen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1024

Downloads

1209

Citations

Computational Intelligence and Neuroscience

Artificial Intelligence and Machine Learning-Driven Decision-Making

Developing Multiagent E-Learning System-Based Machine Learning and Feature Selection Techniques

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Data Collection

3.2. Preprocessing Dataset

3.3. Integrating Dataset

3.4. Feature Selection Methods

3.5. Splitting Dataset

3.6. Training and Optimizing ML Algorithms

3.7. Evaluating ML Algorithms

4. Experiments and Results

4.1. Experiment Setup

4.2. Results of Applying Feature Selection Methods

4.2.1. Univariate Feature Selection Method

4.2.2. Extra Trees Feature Selection Method

4.3. Results of Applying ML Algorithms to Full Features

4.4. Results of Applying ML Algorithms to Thirteen Features

4.4.1. Thirteen Selected Features by Univariate

4.4.2. Thirteen Selected Features by Extra Trees

4.5. Results of Applying ML Algorithms to Six Selected Features

4.5.1. Six Selected Features by Univariate

4.5.2. Six Selected Features by Extra Trees

4.6. Discussion

5. Conclusion

Data Availability

Conflicts of Interest

References

Copyright