#### Abstract

Machine learning (ML) and deep learning (DL) algorithms work well where future estimations and predictions are required. Particularly, in educational institutions, ML and DL algorithms can help instructors in predicting the learning performance of learners. Furthermore, the prediction of the learning performance of learners can assist instructors and intelligent learning systems (ILSs) in taking preemptive measures (i.e., early engagement or early intervention measures) so that the learning performance of weak learners could be increased thus reducing learners’ failures and dropout rates. In this study, we propose an intelligent learning system (ILS) powered by the mobile learning (M-learning) model that predicts learners’ performance and classify them into various performance groups. Subsequently, adaptive feedback and support are provided to those learners who struggle in their studies. Four M-learning models were created for different learners considering their learning features (study behavior) and their weight values. The M-learning model was based on the artificial neural network (ANN) algorithm with the aim to predict learners’ performance and classify them into five performance groups, whereas the random forest (RF) algorithm was used to determine each feature’s importance in the creation of the M-learning model. In the last stage of this study, we performed an early intervention/engagement experiment on those learners who showed weak performance in their study. End-user computing satisfaction (EUCS) model questionnaire was adopted to measure the attitude of learners towards using an ILS. As compared to traditional machine learning algorithms, ANN achieved the highest prediction accuracy for all four learning models, i.e., model 1 = 90.77%, model 2 = 87.69%, model 3 = 83.85%, and model 4 = 80.00%. Moreover, the five most important features that significantly affect the students’ final performance were MP3 = 0.34, MP1 = 0.26, MP2 = 0.24, NTAQ = 0.05, and AST = 0.018.

#### 1. Introduction

Despite the numerous advantages of mobile learning (M-learning), the practicality of M-learning had been limited due to not knowing about the exact behavior or learning features of M-learners and how these learning features have different meanings and values for different M-learners [1]. The use of mobile phones for E-learning, U-learning, and M-learning creates additional challenges for the delivery of the adaptive learning content [2]. First, the limited space on a mobile phone screen does not allow the learning content to be presented in the same way as it is presented on web and desktop applications. Moreover, due to low and inconsistent internet speed, mobile learners feel frustration when accessing the learning content during the learning process. Nonetheless, M-learning has several advantages over traditional and web-based learning systems [3]. M-learning-based education has already reached many learners as it provides the possibility of an adaptation process that is difficult to achieve in traditional classroom settings. Mobile learners (M-learners) using mobile phone technologies can access and study the learning content at any time and from anywhere. In M-learning settings, M-learners have the freedom to choose the learning content they want to learn, unlike traditional classroom settings where learners are bound to the instructor’s provided learning content and instructions.

Adaptive learning (AL) is an educational method that uses computer algorithms to orchestrate the interaction with the learner and delivers tailored learning content according to the needs and performance of each learner [4]. In AL systems, learning content is presented according to the unique needs of each learner as indicated by learner responses, interactions, and feedback. Though the primary target of AL systems is education, they can be used in a variety of areas, i.e., professional training, business education, simulation, etc. AL systems have been designed primarily for desktop applications and web applications and recently have been the prime target for mobile learning systems [5].

Providing tailored learning content is one of the challenges that an adaptive learning system faces. To provide needed support and to assist learners at the optimal time, it is necessary to determine the exact learning behavior, the strength, and weaknesses of different learners. The learning features determine the learning behavior of various learners which when collected and processed carefully can help adaptive learning systems to guide learners properly. Machine learning (ML) and deep learning (DL) algorithms can model the learning behavior of learners when the corresponding learning features are properly provided. ML/DL algorithms, when trained adequately, can determine the features that have a significant impact on the performance of learners. Moreover, ML/DL multiclass classification algorithms can classify different learners into performance groups according to their previous interaction and performance.

Intelligent learning system (ILS) integrated with machine learning (ML) techniques can be used to create an M-learning model that properly reflects the learning behavior of individual learners. A carefully developed M-learning model based on proper learning features can predict and identify learners’ strengths, weaknesses, preferences, and performance. ML techniques that have been used to develop the M-learning model include *K*-nearest neighbors (KNNs) [6], linear discriminant analysis (LDA) [7], multiclass logistic regression (softmax regression) [8], support vector machine (SVM) [9], decision trees (DTs) [10], random forest (RF) [11], and artificial neural networks (ANNs) [12]. Fundamental steps in developing an M-learning model include defining proper learning features (time, location, preferences, performance, etc.), generating a dataset of learning features, features’ analysis, preprocessing, transformation into the appropriate form, choosing proper ML algorithm, training, testing, and reevaluating the M-learning model [13]. When developed and matured, the M-learning model can be used as a part of ILS for intelligent guidance and adaptive support. During the learning process, the M-learning model can recommend additional learning content if they need it or let the learner move ahead if the current topic is already covered. An adaptive M-learning system performs adaptation based on the M-learning model and updates the M-learning model when new facts are driven from M-learners’ learning features. Information gathered from the interaction mechanism of M-learning systems is often imprecise, and its interpretation is typically uncertain. Machine learning algorithms are commonly used for learner modeling because of their ability to represent complex feature relationships, feature weights, and combined features’ effect on the learning behavior of individual learners in the learning environment [14].

Models based on DL algorithms can focus on the right features by themselves, requiring little intervention from human programmers. ML and DL algorithms have the single most important goal of AI research: allowing computers to understand and model our real world well enough to exhibit something identical to what we human beings call intelligence. Some of the popular DL algorithms are artificial neural network (ANN) [15], convolutional neural network (CNN) [16], recurrent neural network (RNN) [17], self-organizing maps (SOMs) [18], deep Boltzmann machine (DBM) [19], and deep autoencoder [20].

The contributions of this study are as follows:(1)Creating an M-learning dataset: the dataset includes learning features’ instances/records generated during the learning process and logged on an online Firebase cloud. The dataset is used as an input to the ANN for the development of the M-learning model.(2)M-learners’ behavior modeling: we used the M-learners’ dataset to model learning in the M-learning environment, i.e., creating the M-learning model. In modeling M-learners, the important step is to determine the effect of independent learning features on dependent features, i.e., M-learners’ performance (final grades).(3)Using the artificial neural network (ANN) for learners’ classification: ANN (with hidden layers) is a type of deep neural network or deep learning algorithm that is used for M-learners’ classification, important features’ identification (learning features), and feature weight determination.(4)Deployment of the M-learning model on mobile devices: in this step, we deployed the M-learning model on mobile devices of M-learners to determine how accurately the learning performance of M-learners is predicted and how the M-learning model helps M-learners in the learning process and making the right decision.(5)Performing early intervention/engagement experiment: the objective of this experiment was to determine whether early intervention by ILSs motivates M-learners to improve their performance. The result of this experiment revealed that early intervention could be a very effective technique in encouraging weak learners to improve their learning behavior.

#### 2. Related Work

Recently, Kotsiantis presented a novel case study, where he gathered demographics and learning data from written assignments for the ML regression method to estimate learners’ future performance [21]. In this experimental study, Kotsiantis revealed that the education domain offers many interesting applications to ML. A learner model was developed based on the mapping between predictors’ features and target feature. Linear regression ML technique was compared to other popular and benchmark ML techniques such as model trees (MT, the counterpart of decision trees (DTs)) [22], artificial neural networks (ANNs) [23], locally weighted linear regression (LWLR) [24], and support vector machine (SVM) [25]. The comparison results revealed that model trees (MTs) give accurate results for the construction of a software support learning tool.

ML techniques used for educational purposes can be divided into classification/regression algorithms, sequential pattern analysis, association rules, clustering, and web information mining [26]. Campbell proposed and developed a learner model using linear regression that demonstrated that learner SAT scores are insignificantly predictive of learner success [27]. With the inclusion of the LMS login feature, the predictive power of the model was tripled. The author demonstrated that the login feature is directly proportional to the level of effort. The more log the learner performed, the more the effort he/she is undertaking. It was revealed that learners with a low or average number of SAT scores could achieve success in the final grade through above-average levels of effort.

One of the popular and simple ML techniques used in educational settings is association rule mining [28]. Like other ML techniques, association rule mining tries to find hidden patterns in the data. Association rule mining finds patterns by analyzing features that occur together and highly correlated features. Rules in association rule mining are not extracted from users’ behavior or preference but rather finding relationships between elements in every distinct transaction.

Numerous research studies in the past have been carried out that used unsupervised clustering algorithms to group learners according to similar learning features. Anaya and Boticario used a clustering technique called expectation maximization (EM) that groups similar important features without knowing in advance about the structure of the data [29]. In clustering algorithms, the features are grouped together according to their similarities by applying the Euclidean metric. Learners were grouped by clustering techniques according to their online collaboration in distance learning settings. The results indicated that, for successful collaborative learning, regular and frequent evaluation of team collaboration is necessary. Lin et al. applied two-stage clustering (self-organizing maps and *K*-means) on training data of automobile driving schools [30]. The result of clustering served as a reference point for future training courses. The authors also developed an educational training prediction model using a backpropagation neural network. The prediction model was used to create a knowledge management system that assists the learner in automobile training courses.

Deep learning (DL) is the most trending and discussed research area in the field of AI. DL is a subfield in ML that uses hidden layers in neural network models to represent data both at a granular and abstract level. DL algorithms have small processing units inside hidden layers called neurons or perceptions that can apply linear and nonlinear transformations to the input data. Different DL models and architectures have been proposed, developed, and successfully applied in both supervised and unsupervised problems in the fields of image processing, natural language processing, handwriting recognition, self-driving cars, and computer vision. DL algorithms form a hierarchy of multiple hidden layers where high layers drive data from lower layers. Table 1 lists the popular DL architecture, comparisons with baseline algorithms, evaluation method, and performance score.

#### 3. Proposed Mechanism

Mainly, we have divided our research work into four phases. The first phase is about mobile learning system (Learnit) development, testing, and deployment. The development of the mobile learning system was necessary for feature generation and gathering. Phase two describes how learners used Learnit, the types of learning features, and how learning features were gathered from it. Phase three (discussed in Section 4) gives a complete description of how different machine and deep learning algorithms were applied to learning features for analysis and examination. Phase four (also discussed in Section 4) detailed results generated from the analysis and how results can be used in the adaptive M-learning process. Figure 1 shows the steps carried out during M-learning model development, testing, integration, features’ collection, deployment, early intervention, and conducting end-user computing satisfaction (EUCS) experiment.

##### 3.1. Mobile Learning System (Learnit)

Figure 2 shows the interaction of learners with different activities of the Learnit app, whereas Figure 3 shows learning, problem posting, and quiz-taking activities. For this study, three courses (computer basics, C++ programming, and JAVA programming) were offered to learners. While using Learnit, a learner can select any course he/she is interested in. Each course is further divided into three learning modules. Each learning module consisted of three learning topics. After the completion of each module, a quiz activity was offered to learners to evaluate their learning. A final quiz/checkpoint was administered after the completion of three courses to evaluate the overall learning outcomes of the learners. Learners were provided with two types of learning content (video and PDF text) for every learning topic.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

##### 3.2. Features’ Extraction

As mentioned earlier, the learners’ features were extracted and gathered from learners while they used the Learnit app. Table 2 shows the preprocessed features along with their category and description.

875 learners from the Institute of Computing, Department of Physics, and Institute of Numerical Science participated in using the Learnit app. After data cleansing and security, 850 learners’ data were finally selected to be included in the dataset and to be further used as an input to the learner model. These features were further preprocessed and converted into a form that the learner model can accept as an input. The features’ preprocessing steps include features’ encoding, features’ scaling, and splitting features into training and testing sets.

#### 4. ANN-Based M-Learning Model (Modeling Learner Using ANN)

To develop the M-learning model, ANN with two hidden layers was used as shown in Figure 4. For the ANN-based learning model, the most important element is historical feature instances. The more the feature instances are provided to the learner model, the better it will learn and will generate accurate results. In this study, the ANN-based learner model should accurately categorize learners into 5 (A, B, C, D, and F) different groups based on their learning performance.

##### 4.1. Training ANN-Based M-Learning Model

Training the ANN-based M-learning model includes the following steps:(1)Feeding the M-learning model with feature instances, i.e., feed-forward technique or forward propagation where data flow in one direction, from the input to the output layer.(2)Selecting the activation function for hidden layer neuron activation. In our experiment, we have selected the ReLU activation function.(3)Using the activation function at the output layer. As our problem is categorical, therefore, we have used the softmax function.(4)Backpropagation: to reduce the difference between the actual result and generated result, i.e., to reduce the loss.

Forward propagation is the core process during the model training and learning phase. It is the forward propagation along with the backpropagation technique, where a model learns the weights and bias values for neural network synapses. Our input features’ data are in the form of , where 13 is the number of input features and *n* is feature instances/records. The default configuration of the ANN accepts one feature instance at a time, but multiple feature instances can be entered into the ANN by a technique called matrix multiplication. Using matrix multiplication, multiple feature instances can enter the ANN at a time, speeding up the training and learning process as a result. To perform matrix multiplication, we defined two matrices, *X* and , where *X* represents the input features’ data and is the weight of the synapse between the input layer and hidden layer 1. Matrix *X* is in the form , where *M* is the input features and *N* is the feature instances. Similarly, the matrix represents the weights of the synapses between the input layer and hidden layer 1, having dimensions, where 13 is the input neurons and 11 represents the values of the weights of the synapses attached to each input neuron (11 synapses per neuron). We used the Python Dense class to initially initialize weights on synapses. Later, synapses’ weights get updated according to the error value backpropagated from the output layer. Mathematically, matrices *X* and can be represented as

Matrix *X*, when multiplied by matrix , yields matrix having dimensions as illustrated in equation (2). For simplicity, the resultant matrix multiplication process is represented in equation (3).

Two processes are performed during ANN forward propagation when input data are transferred from the input layer to hidden layer 1. First, at each neuron, at hidden layer 1, the input feature value is multiplied by synapses’ weights. This multiplication task is done at each neuron in hidden layer 1 for all features’ data and synapses connected to the corresponding neuron. The multiplication results are then added together, and the activation function is applied to the result. The processing at each neuron in hidden layer 1 can be summarized by the following equation. represents the activation function performed after the sum of the product operation.

Secondly, the activation function is performed on the result obtained from the sum of the product operation. We used the rectified linear unit (ReLU) activation function at each hidden layer neuron because it is computationally less expensive, efficient, and can easily propagate errors during the backpropagation process. ReLU activation function is represented in the following equation:where is

The ReLU activation function is applied independently to each entry in matrix , and a new matrix is generated called . The size of matrix is the same as that of matrix . All the neurons at hidden layer 1 perform the ReLU activation function on their input matrix and generate modified matrices ready for feed-forwarding to hidden layer 2. The matrix data are propagated forward from hidden layer 1 to hidden layer 2 in the same way as it was propagated from the input layer to hidden layer 1. Similarly, the operation at hidden layer 2 neurons is the same as hidden layer 1 neurons’ operation. At hidden layer 2 neurons, first, the input values from hidden layer 1 are multiplied by corresponding synapses’ weights and then added together to perform the ReLU activation function. The matrix stored at hidden layer 1 has an dimension, whereas the dimension of the matrix is , where 11 represents neurons at hidden layer 1 and their corresponding weights. The multiplication of the matrix with yields the matrix having the dimension of and can be represented by the following equation:

After the matrix multiplication operation, the ReLU activation function is performed on each entry in matrix resulting in a new matrix called . Mathematically, the equation can be written as

Similarly, using the forward propagation technique, matrix at hidden layer 2 is propagated to the output layer. The neurons at the output layer, after performing softmax activation function (for categorical/multiple classification results) on input data, generate *p*_{j}̂, which is the ANN-based M-learning model-predicted value of the learner’s final grade (FG). It is very likely that initially, there is a difference between the predicted result *p*_{j}̂ and actual result “*a*.” For minimizing the difference between the predicted result and the actual result, we used a technique called backpropagation, which is further explained in the following section.

###### 4.1.1. Minimizing Model Cost/Loss Using Backpropagation

M-learning model cost or loss function tells us the difference between the predicted and actual output result. Ideally, the predicted result should be equal or very near to the actual result, but in a scenario where the difference is substantial, the backpropagation technique is used to reduce the difference between predicted and actual output results. We expressed the learner model cost function in the following expression:

In the above expression, “*a*” is the learner’s actual final grade, whereas *p*_{j}̂ is the predicted final grade result. For an accurate and reliable learner model, this difference must be as minimum as possible. As a result, the M-learning model will convey an accurate picture of learners’ learning behavior and performance. For minimizing the learner model cost function, we had two choices: (1) changing the input features’ data and (2) tweaking weights on ANN synapses. Input feature instances are static and will not change; therefore, the only option left with us is to adjust weights on ANN synapses. To determine suitable weights for synapses, we used a technique called stochastic gradient descent (SGD) along with backpropagation. SGD technique is efficient and reduces time and computation costs during the backpropagation process. Furthermore, when we have high-dimensional data and we are looking for suitable weights across millions of synapses, the SGD technique is more impressive in determining suitable synapses’ weights and therefore reducing the overall model cost. Looking at our ANN model architecture, we can observe that the weights that are responsible for generating cost/error are spread across three matrices, i.e., , , and . We calculated the derivative of cost *C* concerning , , and , i.e., , , and , resulting in the same number of gradient values as the number of synapses’ weights across the ANN. The derivative of *C* concerning , , and determines how many weight values are responsible for generating error/cost in the M-learning model.

The key in the training of the M-learning model is to find the appropriate values for weights across the model so that the model can build good approximator functions that can be applied to the new learner features for their performance prediction. In other words, we want to have a set of weights that minimize the error/loss/cost at the output layer. To demonstrate the backpropagation process in detail, consider the simplified M-learning model in Figure 5.

If the weights get changed, the values across hidden layers 1 and 2, , , and the output result would also change. Here, the output result is functional dependent on , , and weights and activation function in hidden layers 1 and 2. We can, mathematically, formulate the output as an extension of the composite function:

If we have an error on the output layer, then it means that the error is functionally dependent on the output function which in turn is dependent on the function of input, weights, and activation functions. To know how many arbitrary weights, i.e., , are responsible for the generated error, we compute the derivative of the error w.r.t , and the result would be

In the backpropagation process, we are traversing from the output error to , taking iterative steps using the chain rule. We take the derivative of the error w.r.t the output, derivative of the output w.r.t to hidden neurons, and finally derivative of hidden neurons w.r.t weights. This process is shown in Figure 6.

To get a comprehensive intuition of the backpropagation process, we are going to determine the derivative of the error w.r.t the weights at the input layer denoted by in Figure 7. The architecture of the learner model in Figure 7 is simplified for a better understanding of the backpropagation process. Mathematically, we are trying to obtain how many input layer weights are responsible for the error as shown in the following equation:

Looking at Figure 7, we also know that

This implies that the derivative of the output error w.r.t to input layer weights is equal to the derivative of the error w.r.t the predicted result multiplied by the derivative of the predicted result w.r.t input layer weights. The term tells us the derivative of the error w.r.t output prediction which can be written as .

Therefore, the equation becomes

Going one layer further back and applying the chain rule, we can write as

From equation (15), we can deduce that the derivative of the predicted result w.r.t input weights is equal to the derivative of the output w.r.t the derivative of the input (*p*_{i}) at the output layer multiplied by the derivative of the input value w.r.t input weights. Similarly, applying the chain rule, can be written as

In the above expression, the term at the right-hand side corresponds to the derivative of the ReLU function which is equal to the output result multiplied by one minus the output. *P*_{j} denotes the output of the activation function at the output layer. Now, equation (15) can be written as

Moving backward to hidden layer 2, we can notice that the red lines spread out in multiple directions. Since multiple weight values affect the value of *p*_{i}, we must take all of them into derivative which can be represented by sigma notation:

Looking at Figure 7, we can observe that *p*_{i} is the summation of hidden layer 2 weights multiplied by *z*_{j} (activation performed on the sum of products). Taking the derivative of *pi* with *z*_{j} would give us connecting weights between hidden layer 2 and the output layer and can be expressed as

We can express the rate of change of *p*_{i} and *z*_{j} w.r.t as

Since we cannot quite put the derivative of *z*_{j} w.r.t into a numerical form, we continue to use the chain rule and follow the fourth red line back in Figure 7 to determine that

Since *z*_{j} is the result of applying the activation function to *z*_{i}, therefore, we use the same logic as in the previous layer and apply the ReLU derivative. The derivative of *z*_{i} w.r.t established by the fifth line back is based on the same idea of spreading out and taking the summation of all contributions:

We move on to performing more reduction and find the derivative of *y*_{j} w.r.t .

Traversing further back in our network, we notice that, at red line number six in Figure 7, we still must deal with another activation function. The last line in the backpropagation process traverses from the input at *y*_{i} to *x*_{j} which represent the weights between the input and hidden layer 1. In fact, are the weights which we were attempting to backprop to. At this point, we can determine the change in *y*_{i} w.r.t which can be expressed by the following equation:

Notice here that the coefficient of is *x*_{j}. Combining all the individual expressions gives us the final expression. Here, the final expression tells us how many weights to adjust at each layer to reduce to the generated error.

##### 4.2. Implementation of the M-Learning Model Using the Artificial Neural Network (ANN)

We developed and implemented the M-learning model using the ANN. Firstly, we initialized the weights of synapses with numbers very close to zero with the help of the Keras Dense class. Secondly, the Keras Sequential class was used to propagate dataset features’ observations from the ANN input layer to the output layer. During forward propagation, the activation function at every neuron in the ANN is restricted by the weights of the synapses. The output of the M-learning model is learners’ performance grades, i.e., A, B, C, D, and F. Table 3 shows learners’ performance scores classified into five-level grades.

We assumed that features such as module performance 1 (MP1), module performance 2 (MP2), and module performance 3 (MP3) have the highest impact on the learners’ final grades; therefore, we performed multiclass classification on four learning models as discussed in the following briefly:(1)M-learning model 1: this model contains all the features from the dataset except final grades(2)M-learning model 2: this model is similar to M-learning model 1 but without the MP3 feature(3)M-learning model 3: this model is similar to M-learning model 2 but without the MP2 feature(4)M-learning model 4: this model is similar to M-learning model 3 but without the MP1 feature

We compiled M-learning model 1 and achieved an accuracy of 90.77 percent. To better analyze and visualize the results of M-learning model 1, we constructed a confusion matrix on actual grade scores and predicted grade scores as shown in Figure 8 (without normalization and normalized confusion matrices). The confusion matrix for M-learning model 1 revealed that 107 out of 130 grades were predicted correctly, whereas the remaining 23 grades were predicted incorrectly.

**(a)**

**(b)**

The confusion matrices (without normalization and normalized) for M-learning models 2, 3, and 4 are shown in Figures 9–11. The diagonal elements in confusion matrices are those elements that are predicted accurately, whereas the remaining elements are predicted inaccurately. Those elements that are predicted accurately are called true positives, whereas those elements that are predicted inaccurately are called false positives.

**(a)**

**(b)**

**(a)**

**(b)**

**(a)**

**(b)**

From confusion matrices in Figures 9–11, we can observe that the accuracy of models decreases as we remove the modules’ performance, i.e., features MP1, MP2, and MP3 from independent features. This concludes that module performance scores are an important factor in predicting the final grades of learners. Moreover, to know whether our M-learning models suffer from overfitting, we plot training accuracy, testing accuracy, training loss, and testing loss over the number of epochs passed.

Figures 12–15 show the model accuracy and model loss for M-learning models 1–4. In Figure 12, initially, during early epochs, the model accuracy of the training and testing dataset somewhat matched up, but later on, during the last 50 epochs, the model accuracy of the training and testing dataset somewhat deviates, but overall, it does not seem that overfitting is a huge problem for M-learning model 1, and the accuracy of the model is acceptable. The model loss of M-learning model 1 decreases as the number of epochs passes away.

**(a)**

**(b)**

**(a)**

**(b)**

**(a)**

**(b)**

**(a)**

**(b)**

This concludes that M-learning model 1 is successful in minimizing cost/loss function over time and is gradually learning the exact relationship between independent features and final grades. The results of the model accuracy and the model loss for M-learning model 2 and M-learning model 3 are somewhat similar to M-learning model 1 as shown in Figures 13 and 14, but we can observe that the case for model accuracy and model loss for M-learning model 4 is different. In Figure 15, we can perceive that, as more and more epochs are completed, the difference between the model accuracy and model loss of the training set and testing set increases.

This is because now, due to the absence of module performance features (MP1, MP2, and MP3), M-learning model 4 is not able to predict the final grades, and the model loss increases and the model accuracy decreases on the testing set. The results in Figure 15 also divulged that module performance features are important in predicting the overall final grades of learners.

##### 4.3. Multiclass Benchmark Machine Learning Algorithms’ Prediction Accuracy in Comparison with the ANN

Before fitting all four learning models to different machine learning algorithms, some preprocessing was performed. The parameters adopted for artificial neural networks were 200 epochs and a batch size of 5, whereas a total sample of 200 trees was selected for the random forest ensemble method. For support vector machines, we selected sequential minimal optimization (SMO) algorithm. To estimate the predictive accuracy of each algorithm, the 5-fold cross-validation resampling technique was used. Under such a scheme, the data are randomly divided into 5 subsets of equal size. Subsequently, out of 5 subsets, one set is used as the validation data for testing the model, while the remaining 4 subsets are used as a training set. From Table 4, it can be noted that ANN and random forest algorithms have superior prediction accuracy for all four learning models, while linear discriminant analysis exhibited inferior accuracy.

##### 4.4. Features’ Weights and Importance

We used the random forest ensemble method to determine each feature’s importance and weight in predicting the final performance grades. Random forest is based on an adaptive learning methodology that combines several models to generate better results than the individual model result. Feature weights and importance give good evidence of what features contribute most in predicting the final grades of learners. The features’ weights were integrated with the M-learning model to provide adaptive content and tailored navigation to learners during their interaction with the Learnit app. Table 5 shows the values of the weights of each feature in predicting the final grades.

#### 5. Early Intervention Experiment during the Learning Process

In the last stage of our research, we conducted an early intervention experiment to figure out whether preventive measures have an influence on learners in improving their study performance. The experiment was carried out on those learners who achieved performance grades D and F (learners having 11 or less than 11 scores in their final performance quiz). At this point, our M-learning model is trained with learners’ features and is matured enough to predict the learning performance of the existing and new learners. The goal of carrying out an early intervention experiment was to motivate and encourage weak learners to improve their learning performance and make M-learning model predictions false. A total of 425 learners (having D and F grades) were selected and were divided into control and experimental groups (the control group having 212 learners whereas the experimental group having 213 learners). Control group learners were independent of preventive measures and early intervention and received normal learning content, guidance, and quizzes, whereas experimental group learners received motivational triggers, adaptive triggers, and adaptive learning content. Moreover, adaptive navigational paths were recommended to experimental group learners during their learning process. The early intervention experiment lasted for 50 days.

During the early intervention experiment, different adaptive triggers were presented to experimental group learners during their learning process. The adaptive triggers were presented according to learners’ preferred learning content types and how adaptive triggers are assigned to the learning content.

The timings of the adaptive triggers are before and after. Before triggers refer to those triggers that are presented to learners before they have started the learning activity, whereas after triggers mean presenting triggers to learners after they have finished a learning activity. The purpose of adaptive triggers was to assist learners during their learning process and help them in understanding exhaustive and difficult topics. Likewise, different motivational triggers were presented to learners on their smartphones according to their performance scores. Table 6 shows learners’ performance state, the assigned motivational triggers, and their examples.

BJ Fogg in his Fogg behavior model (FBM) suggests that three elements, i.e., ability, motivation, and the trigger (prompt or message), must be present at the same time for a learning behavior to occur [59]. Adaptive triggers aimed to help learners while they study, whereas motivational triggers aimed to increase learners’ motivation towards learning. One of the important factors related to adaptive and motivational triggers is the opportune time. Opportune time is the right time at which adaptive and motivational triggers are sent to the learners on their smartphones. An example of sending motivational triggers at the opportune time is right after the learner quiz activity in case the performance of the learner in that quiz is not good. Similarly, praise and appreciation triggers were sent to the good consistent learners once a week as too many motivational triggers may overwhelm or annoy learners during the learning process.

#### 6. Early Intervention Experiment Results

After 50 days, we analyzed the final score results of the experimental and control group. Figure 16 shows a line graph representing the final score of both the experimental and control group. The line graph confirms that the intervened learners showed overall better performance than unintervened learners in the final quiz. Moreover, the statistical analysis result revealed that intervened learners showed an overall 12.30% higher performance than unintervened learners. These results confirm that proactive or preemptive measures do affect learners’ behavior and can improve their learning performance scores.

##### 6.1. Analyzing Learners’ Satisfaction Using the EUCS Model

To elicit learners’ experience of using the Learnit application, end-user computing satisfaction (EUCS) model was used. As discussed in the proposed system section, the Learnit application is based on the M-learning model which uses an artificial neural network (ANN) to determine how independent learning features affect the final learning performance of learners. The EUCS model is composed of a questionnaire that can be used to measure the attitude of users towards using a computer application. With little modification in the EUCS questionnaire, an online survey was steered on 213 experimental group learners. Six dimensions of the Learnit application, namely, usefulness, ease of use, engaging, timeliness, adaptiveness, and attitude towards the Learnit application, were measured by the EUCS questionnaire. The online survey comprised 12 questions covering the 6 dimensions of the Learnit application. The 6 dimensions were measured on the 5-point Likert scale, where 5 represents “strongly agree,” 4 represents “agree,” 3 represents “neutral,” 2 represents “disagree,” and 1 represents the “strongly disagree” option. During the experiment, the mean learner satisfaction was set to greater than 4, which infers that overall, the learners were satisfied while they used the Learnit application. Table 7 shows the 6 dimensions, the related questions, and learners’ satisfaction mean score.

The response to dimensions 1 and 2 (usefulness) indicates that the Learnit app was successful in increasing the job ability, i.e., programming skills of learners (*m* = 4.68 and *m* = 4.59). The response to dimensions 3 and 4 shows that the Learnit app interaction with learners was very simple and user-friendly (*m* = 4.52 and *m* = 4.58). The response to dimensions 5 and 6 indicates that the Learnit app was able to draw the attention of learners in giving time to learn computer programming using mobile devices (*m* = 4.78 and *m* = 4.71). The response to dimensions 7 and 8 implies that the Learnit app considered the experimental group preferred learning time and sent the programming content accordingly (*m* = 4.34 and *m* = 4.53). The learner replies to dimensions 9 and 10 revealed that tailored programming content was delivered to learners according to their learning abilities and performance by the Learnit app (*m* = 4.78 and *m* = 4.23). The answer to dimensions 11 and 12 reflects that learners agreed to use a Learnit or a similar type of application in the future to increase their programming skills (*m* = 4.55 and *m* = 4.34).

#### 7. Conclusion and Future Work

In this research study, we have developed the M-learning model powered by the ANN. M-learning model is a part of a larger system called ILS. The purpose of ILS is to process and analyze M-learners’ features to determine which features have the most influence on the learners’ performance. The experimental results revealed that the ANN outperformed traditional ML algorithms such as LDA, softmax regression, SVM, DT, and RF and achieved a prediction accuracy of 90.77%, 87.69%, 83.85%, and 80.00% for four M-learning models. Moreover, by using the RF algorithm to determine the most important features having a significant impact on the M-learners’ final performance, we enlighten that MP3, MP1, MP2, NTAQ, and AST have the highest weights and contribute most in increasing M-learners’ performance.

In the future, we would like to test the M-learning model on a larger scale and integrate it with the Learning Management System (LMS). By doing so, the M-learning model will help us in making LMS an adaptive and more user-friendly system. One of the disadvantages of traditional LMS systems is that they are based on a one-size-fits-all approach where all learners are treated equally not considering individual weaknesses, strengths, preferences, and personalized learning behavior. By integrating the M-learning model with LMS, we might be successful in making the learning process self-paced and adaptive. Moreover, we would also like to improve approaches presented in this research study by combining other deep neural network algorithms such as recurrent neural network (RNN), deep autoencoders, self-organizing maps (SOMs), and long short-term memory (LSTM) with the proposed M-learning model [60].

#### Data Availability

The dataset used to support the findings of this study can be found at https://drive.google.com/file/d/1dGvbdfK8d2nexEatqwPhcusk8fwrT7ZI/view?usp=sharing.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported by a grant from the Research Center of the Female Scientific and Medical Colleges, Deanship of Scientific Research, King Saud University.