Abstract
Aiming to solve the problem that ideological and political education courses in universities are not targeted enough and cannot form personalized recommendations, this paper proposes an ideological and political education recommendation system based on analytic hierarchy process (AHP) and improved collaborative filtering algorithm. Firstly, considering the time effect of student scoring, the recommendation model is transformed into Markov decision process. Then, by combining the collaborative filtering algorithm with reinforcing learning rewards and punishments, an optimization model of student scoring based on timestamp information is constructed. To quantify the degree of students' preference for courses, the analytic hierarchy process is used to convert the students' behavior data into the preference value of courses. To solve the problem of data scarcity, the missing values are predicted by the prediction score rounding filling and the optimization boundary completion method. Experimental results show that the feasibility of the proposed system is verified, and the system has vital accuracy and convergence performance. The ideological and political education recommendation system proposed in this paper has important reference significance for promoting ideological and political education in the era of big data.
1. Introduction
As the main channel to carry out ideological and political work in colleges and universities, ideological and political education curriculum is a critical way to practice the mechanism of education in colleges and universities. Ideological and political education courses run through the whole process of higher education teaching and are an essential system for universities to cultivate highquality talents [1]. At present, ideological and political education in colleges and universities mostly adopts the way of large classes and open classes for collective teaching. There are many problems in this way, such as single form, weak pertinence, lack of synergistic effect, and being unable to form personalized collaborative education mechanism. Therefore, combining with the characteristics of students, it is of great significance to carry out personalized recommendations of ideological and political education courses.
With the continuous development of network and information technology, the amount of network information data of ideological education courses is also increasing exponentially. Faced with a variety of ideological education courses, how to carry out personalized recommendations needs to be systematically studied. Personalized recommendation can effectively filter unwanted information by analysing users' behavioural preferences through various recommendation algorithms [2]. Recommendation system can actively provide personalized information for users. At present, personalized recommendation has been widely used in social networking [3], news, music, books, and movies [4], such as cloud music [5] and online shopping product recommendation [6].
Recommendation algorithms are mainly divided into the following three categories [7]. In the first category, collaborative filtering recommendation algorithm makes recommendations based on users' intentions [8, 9]. It has achieved significant improvement in recommendation accuracy. However, their interpretation of recommendation results is often not intuitive [10]. In the second category, contentbased recommendation algorithm conducts feature modeling through various available content information [11, 12]. Because the content of an item is easier for the user to understand, it is often intuitive to explain to the user why the item is being recommended. Collecting the required content information under different recommendation backgrounds is a timeconsuming task, which becomes the bottleneck of contentbased recommendation algorithms. However, the construction of knowledge map can reduce the workload of extracting content information. Therefore, knowledge graph, as an emerging auxiliary data, is attracting the attention of researchers [13]. Deep learning also brings innovative ideas for recommendation system [14]. However, the existing recommendation algorithm based on deep learning only considers the rating data by using matrix decomposition, which inhibits the recommendation effect [15]. In the third category, hybrid recommendation algorithm [16, 17] can often achieve better recommendation effect by combining the advantages of multiple recommendation algorithms.
As a typical recommendation algorithm, collaborative filtering technology mainly includes memorybased and modelbased algorithms [18]. The former calculates the similarity by analysing the useritem score matrix and makes the prediction and recommendation based on the similarity. The latter trains a prediction model through the user's network history, operation, and other data. It then uses this prediction model to predict the project score. Many studies have optimized the recommendation effect by improving collaborative filtering algorithms, such as restricted Boltzmann machine [19], Knearest neighbour algorithm [20], and singular value decomposition (SVD) [21] algorithm. Not only is SVD a mathematical problem, but also it has been successfully applied in many engineering applications. In the recommendation system, it is easy to obtain the full rank decomposition of any matrix by using SVD. Then, it can realize data compression and dimensional reduction. SVD++ [22] further integrates implicit feedback information on the basis of SVD and adopts implicit preference to optimize SVD model with better performance. However, SVD++ and SVD do not consider the impact of time timestamp on recommendation performance, and the actual recommendation effect is related to timestamp to a certain extent. For example, the ratings of users ten years ago on a certain product are different from those of current users, so it is necessary to improve it and optimize the prediction effect.
Considering the time effect of student rating, the recommendation model is transformed into Markov decision process. Then, in order to quantify the degree of students' preference for courses, the analytic hierarchy process is used to convert the students' behaviour data into the preference value of courses. Finally, the accuracy of the recommendation model is improved by constructing a student scoring optimization model integrating timestamp information.
To solve the problem that ideological and political education courses in universities are not targeted enough and cannot form personalized recommendations, this paper proposes an ideological and political education recommendation system. The contribution of our paper is summarized, which is grouped into the following three points.(a)The recommendation model is transformed into Markov decision process considering the time effect of student scoring.(b)An optimization model of student scoring based on timestamp information is constructed by combining the collaborative filtering algorithm with the process of reinforcing learning rewards and punishments.(c)To quantify the degree of students' preference for courses, the analytic hierarchy process is used to convert the students' behavior data into the preference value of courses.
The rest of the paper is organized as follows. In Section 2, modeling for the system is introduced first, and then the process of training and prediction is introduced. Section 3 gives the experimental results and discussion. At last, Section 4 draws a conclusion of this paper.
2. The Proposed Recommendation System
2.1. Modeling for the System
Firstly, the curriculum recommendation system of ideological and political education is modelled, and the highdimensional sparse matrix is decomposed into lowdimensional matrix by singular value decomposition method. Then, students' behaviour data are converted into course preference values through analytic hierarchy process (AHP). Finally, the recommendation model is transformed into Markov decision process by establishing model mapping.
2.1.1. Singular Value Decomposition
In real life, the studentcourse matrix is large, but due to the limited interests of students. The grading data of individual students on courses are often small. The core idea of SVD is to decompose a highdimensional sparse matrix into two lowdimensional matrices. Compared with eigenvalue decomposition, it can only be used for symmetric matrices. SVD can perform full rank decomposition on any M × N matrix to achieve data compression. However, before SVD is used to decompose the matrix, blank items in the matrix need to be filled to get a dense matrix. Assuming that the matrix before filling is R and after filling is R′, the calculation formula is as follows:
The calculation formula of SVD algorithm is as follows:where represents the predicted score value, represents the average value of the score, and represent the offset amount of student and course , respectively, and correspond to the feature vectors of courses and students on each hidden trait, respectively, and T superscript stands for transpose.
If a student scores a course, it means he has seen it. Such behaviour contains certain information, so it can be inferred that the behaviour of grading reflects students' preferences. Accordingly, this preference can be reflected in the model in the form of implicit parameters to obtain a more accurate model SVD++.
The calculation formula of SVD++ model is as follows:where is the collection of all courses browsed and evaluated by student , bias for hidden studentcourse J's personal preferences, and the preference degree of student consists of explicit feedback pu and implicit feedback .
2.1.2. Preference Calculation Method Based on AHP
To analyze the hidden bias of students' preference for courses, the data of students' browsing course introduction, collecting course, and learning course were firstly collected. Then, the analytic hierarchy process (AHP) is used to calculate the value of students' preference for courses. The questions favoured by students are divided into goal layer, criterion layer, and program layer. Use browsing course profiles, bookmarking courses, and studying courses as criteria for making decisions. The hierarchical model of student interest calculation is shown in Figure 1.
The judgment matrix C is constructed by pairwise comparison of different factors. It is more important to bookmark the course than to browse the course introduction. Learning the course content is more important than collecting the course. Therefore, the scale is set as follows: the ratio of collection course to browsing course is 3; the ratio of learning course to browsing course is 5; the ratio of learning course to collection course is 3. Therefore, the judgment matrix C is
The maximum eigenvalue of the matrix is
The eigenvector of the matrix is
of consistency indicator is
and Table 1 shows that the average random consistency index is
The size of the calculated consistency ratio ER is
The final result passes the consistency ratio test. Therefore, the weight of browsing course profiles is 0.11. The weight of collecting courses is 0.26, and the weight of learning courses is 0.63.
The degree of students' preference for course operation is quantified. The calculation formula of students' preference intention is as follows:
U1 here browses the course introduction for students. U2 has a collection of lessons for students. U3 studied the course for students. Therefore, the analytic hierarchy process (AHP) is used to convert student behaviour data into course preference values.
2.1.3. Markov DecisionMaking Process
Markov decision process is an intuitive and basic construction model in decision theoretic programming, reinforcement learning, and stochastic domain. In this model, the environment is modelled through a set of states and actions that can be used to execute the state of the control system. The goal of controlling the system in this way is to maximize a model's performance criteria. At present, many problems have been successfully modelled by Markov decision process, such as multiagent problem, robot learning control, and game playing problem. Therefore, Markov decision process has become a standard method to solve timing decision problems.
A general Markov decision process is represented by A quintuple, as shown in Figure 2, where represents the state, represents the action, and represents the return function. The agent senses the state information in the current environment and chooses to perform some actions according to the current state. The environment sends a reward or punishment signal to the agent based on the selected action. Based on this reward and punishment signal, the agent moves from one state to the next.
To optimize SVD++ recommendation model by reinforcement learning method, the mapping relationship between recommendation prediction model and Markov decision process should be established first. In order to construct Markov decision process, students' preference scores for courses under different time stamps are converted into quintuples. The following is the mapping relationship between course scoring designed in this paper and Markov decision process.(1)For state space S, in this paper, student p scores the course at time n as state . Because the score of students on courses in the data set is 5 integers within the range [1, 5], the range of is [1, 5]. The state under all timestamps constitutes the state space S.(2)For action space C, student U looks at the course under time n and gives a grade , which will affect his grade for the course under time n. Therefore, write as the action from to , as shown in the following equation. The action at all times constitutes the action space C:(3)For state transition probability U, the action taken by student p under state is determined by the timestamp. Once the action is determined, the next state is also determined. At this point, the transition probability between states is determined; that is, , U = 1. The value range of the action is [1, 5].(4)For discount factor ?, in the model, each action generates a corresponding reward. However, the same student's viewing time will have different influences on the selection of the next course, and γ is a factor reflecting this influence. The later the reward, the greater the discount, and the return is always limited. Therefore, set 0 ≤ γ < 1.(5)For reward and punishment function , the punishment and reward function values represent the reward for completing an action in a state. This paper defines the reward and punishment function values as follows:where is the course score of student p when (n + 2). represents the predicted grade of student p for course x calculated using an SVD or SVD++ model. represents the reward and punishment value obtained by student p taking the action under state , and the corresponding reward and punishment table can be obtained according to the reward and punishment function.
According to the Markov decision process mentioned above, the action of transferring from one state to the next state corresponds to the score of the course at the next time. Although the name and type of course are ostensibly ignored, students' preferences for the course are implicitly reflected in the timestamp. This process processes the course data set into the form shown in Table 1, where the first number in parentheses reflects the grade given by the students in the corresponding row to the courses in the corresponding column. The second number reflects the timestamp information or chronological order in which students in the corresponding row watched the courses in the corresponding column. For example, row 1, column 1 (4, 3rd) indicates that student 1 watches course 1 in the third chronological order. So, the timestamp n = 3 and student 1 scored 4 for course 1. NaN said the students did not watch the course.
Sort the data in Table 2 by timestamp, and the generated state transition path is as follows:
According to Table 2, the rules of the state transfer path are obtained, and the first example is used for illustration. The state transition path 4⟶5⟶4⟶2 in line 1 reflects that student 1 is watching course 3 when timestamp n = 1. Its grade for course 3 is 4. If you look at course 2 at n = 2, you give course 2 a grade of 5. If you look at course 1 at n = 3, you give course 1 a grade of 4. If you look at course 5 at n = 4, you give course 5 a grade of 2. The other four transfer paths were obtained in a similar way.
This state transition path represents the state transition in Markov decisionmaking process and guides the updating direction of V table.
2.2. Training and Prediction Process
The recommendation algorithm proposed in this paper includes two parts: training and prediction. During the training, SVD++ algorithm is first used for model training on the training set, and SVD++ recommended model is obtained, as shown in formula (3). Then, the reinforcement learning model is trained, and the reward and punishment function shown in (12) is used to calculate the reward and punishment value of state transfer. The reinforcement learning table is updated for the optimization model of SVD++ recommendation scoring. During the prediction, the prediction score value is firstly obtained according to SVD++ recommendation model. Then, the optimization model designed in this paper is used to optimize the prediction score, and the final prediction score is obtained. The optimization model designed in this paper is expressed as follows:where is the predicted score of student p for the x course calculated by SVD++ recommendation model. is the grade of student p watching the course when the timestamp is (n2) before watching course x. is the timestamp (n − 2) to see the grade of the course. is the value of V table in coordinates. It requires reinforcement learning algorithm and prediction score based on SVD++ recommendation model. It is used to optimize the final predicted score. If the value of does not exist, the mean value of the current V table is taken. is the prediction score after optimization.
2.2.1. Training Process
Firstly, the training set is trained through (3), and the SVD++ recommended model is obtained. Then, the reinforcement learning model was trained. Formula (12) is used to calculate of reward and punishment values, and then is used in the updating process of V value in qlearning Algorithm 1. V table update formula is as follows:where is a 5 × 5 V table, starts at 0, is the value of V at the coordinates , is the reward or punishment for choosing the next move, is learning rate, and γ is discount factor. The higher the V value is, the more the reward you get for performing the next action, and the less the reward you get for performing the next action.

2.2.2. Prediction Process
The prediction process is based on the prediction score obtained by SVD++ recommendation model and combined with V table of training to predict the score of student p on course x. At the same time, it can predict the courses that student p has not watched but other students have watched.
In addition, is not used in the prediction score optimization model shown in (13) for the following reasons. If is used, the optimization model will become . Because and is the score that needs to be predicted, is chosen.
The data set used in this paper has a default value, that is, there is no graded course information. According to the construction idea of the optimization model in this paper, the scoring information of ungraded courses should be used in the subsequent optimization process. This will lead to the possibility of missing s or c values in , which will invalidate the optimization model. To avoid this situation, SVD++ model is adopted in this paper to predict missing values and then fill them to solve the problem of sparse data.
In addition, when n = 1, 2, the boundary and will exceed the subscript range, and there will be no corresponding value. Therefore, the prediction score of the last two columns is used as the prediction score data of column 1 and column 0 in this paper to ensure data integrity.
3. Experiment and Analysis
3.1. Experimental Data Set
To verify the actual effect of the algorithm in this paper on the recommendation model of ideological and political courses in colleges and universities, sklearn library based on Python 3.6.5 kernel provided by GitHub open source platform is adopted. The main library packages used are numpy1.14.3, NetworkX2.3, Keras2.0.5, and sklearn0.20.0. The experiment was conducted in ubuntu16.04 environment. Multidimensional constraints such as scale, feature, target, and noise are set by programming, and sklearn library is used to generate simulated data sets that meet the conditions. The simulated data set of the experiment in this paper contains information of 570 ideological and political education courses, information of 13,000 college students, and information of 50,000 college students' operation and grading of courses.
3.2. Performance Indexes
To evaluate the recommended performance of the proposed algorithm, MAE and RMSE were used as evaluation indexes.
MAE represents the average absolute error between the predicted value and the true value. The smaller the MAE is, the higher the recommendation accuracy is. It is defined as follows:
RMSE represents the square root of the sum of the squares of the deviation between the predicted value and the true value and the ratio of the predicted number t. RMSE reflects the dispersion degree of samples, and the smaller the RMSE, the higher the recommendation accuracy. It is defined as follows:where is the predicted score value, is the true value, and t is the number of predicted scores.
3.3. Feasibility Analysis
Since the learning rate α and discount factor γ can be adjusted dynamically, it is necessary to study the influence of the changes of α and γ in the algorithm in this paper on the prediction performance. The experimental results are shown in Figure 3. As can be seen from Figure 3, when α is constant, γ increases from 0.4 to 0.6, and RMSE of the proposed algorithm decreases continuously. The results were best when α = 0.000003 and γ = 0.6. At this time, RMSE can reach 0.81948, which is 0.0086 lower than before. Thus, the feasibility of the proposed algorithm is proved. This configuration is used for comparison in subsequent experiments.
3.4. Comparison Experiment
To verify the effectiveness of the algorithm in this paper, representative matrix decomposition algorithm [23], direct trust matrix decomposition algorithm [24], and indirect social relationship matrix decomposition algorithm [25] were selected for comparison in the experiment. All experiments were carried out for 10 times, and the experimental results were averaged.
Tables 3 and 4 show the error values of different algorithms. By observing the data in the table, it can be seen that the error value of the algorithm in this paper is lower than that of other comparison algorithms, and the error comparison becomes more obvious with the increase of training sets. According to Tables 3 and 4, the more the training sets, the lower the error rate of the algorithm. Compared with the algorithm [23], the algorithm [24] reduces the error value due to the introduction of direct social relations. On the basis of the algorithm [24], the algorithm [25] established indirect social relations by considering the degree of influence between each student. This relationship includes both direct and indirect connections between students, so the error value is further reduced. However, these algorithms only analyze the degree of influence among students, ignoring the potential connection between courses and the difference of influence degree of different neighbours on different nodes. The algorithm in this paper constructs preference vector and feature vector, respectively, according to the influence degree of scoring and behaviour from different perspectives of students and courses. At the same time, considering the time effect of student scoring, the recommendation model is transformed into Markov decision process. Therefore, the proposed algorithm has a higher recommendation rate than other algorithms.
To verify whether the algorithm in this paper can reduce the iterative convergence times, the original data set is taken as the training set, and the experimental results are shown in Figure 4. It can be seen that compared with other comparison algorithms, the iteration times of the algorithm in this paper quickly reach convergence. Moreover, the recommendation efficiency of the proposed algorithm is significantly higher than other algorithms. This is because the algorithm in this paper fully mines the potential connections between them from the perspectives of students and courses, so it not only improves the recommendation efficiency but also reduces the convergence times.
At the beginning of iteration, the error value of comparison algorithm is about 1.38. By observing Figure 4, it can be seen that the error value of the zeroth iteration of the algorithm in this paper is lower than 1.38. As the number of iterations increases, the algorithm [23] achieves convergence and maximum error only after 20 iterations. However, the algorithm in [24] and the algorithn [25] indirectly corrected the preference vector of students, and reached convergence after 15 iterations, followed by recommendation efficiency. Because the algorithm in this paper optimizes the preference vector and feature vector from the two aspects of students and courses, respectively, convergence has been achieved and the error value is small at the fourth time. Experimental results further verify the proposed algorithm's recommendation accuracy and convergence performance.
4. Conclusion
Ideological and political education curriculum runs through the whole process of higher education teaching and is an essential system for universities to cultivate highquality talents. To realize personalized recommendations of ideological and political education courses, this paper proposes a recommendation algorithm based on the analytic hierarchy process and an improved collaborative filtering algorithm. The action of students scoring courses under different time stamps is transformed into Markov decision process. Through analytic hierarchy process, the student behavior data is converted into course preference value. The collaborative filtering algorithm is combined with the reinforcement learning reward and punishment process, and the prediction effect was improved by adjusting the influence factors. Experimental results show that the feasibility of the proposed system is verified, and the system has good accuracy and convergence performance. In the case of insufficient training set data, how to improve the accuracy of the algorithm is the future research work.
Data Availability
The labeled dataset used to support the findings of this study is available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.