#### Abstract

In view of the lack of accurate recommendation and selection of courses on the network teaching platform in the new form of higher education, a network course recommendation system based on the double-layer attention mechanism is proposed. First of all, the collected data are preprocessed, while the data of students and course information are normalized and classified. Then, the dual attention mechanism is introduced into the parallel neural network recommendation model so as to improve the model’s ability to mine important features. TF-IDF (term frequency-inverse document frequency) based on the student score and course category is improved. The recommendation results are classified according to the weight of course categories, so as to construct different types of course groups and complete the recommendation. The experimental results show that the proposed algorithm can effectively improve the model recommendation accuracy compared with other algorithms.

#### 1. Introduction

At present, online learning has become a common way of learning. Students do not have to be limited by the physical classroom; instead, they can break through the traditional time and space free choice of learning [1]. Online learning platforms emerge one after another, bringing great convenience to people. In the context of the new form of intelligent education [2, 3], the teaching platform that recommends online courses through the analysis of historical behavioral data has become the mainstream of the “Internet + education” model [4].

In the online learning environment, students need to spend tremendous time searching and browsing to confirm whether they are interested in the resources. Therefore, the problem of knowledge overload inevitably arises. An effective way to solve the problem of knowledge overload is to use a personalized recommendation algorithm [5]. After years of development, the performance of the recommendation system has greatly improved. Currently, recommendation algorithms are mainly divided into the collaborative filtering-based recommendation algorithm [6], content-based recommendation algorithm [7], and hybrid recommendation algorithm [8]. The recommendation algorithm based on collaborative filtering finds similar users by acquiring their historical behavior and rating data and by capturing the likes and dislikes of similar users and recommending items that users have not seen before. Content-based recommendation algorithms are mainly based on the items or ratings that users have selected and also based on the user’s historical behavior information to find similar projects to recommend. The hybrid recommendation algorithm will fuse different recommendation algorithms and then recommend to get better recommendation effect. As data grow, so do the types of data. Traditional recommendation algorithms cannot learn the deep features of users or projects. How to make full use of multisource heterogeneous data to improve the performance of recommendation system has become a hot topic of recommendation system research [9].

In recent years, deep learning has its own strong learning ability and has been widely applied in image recognition [10, 11], speech recognition [12, 13], natural language processing [14, 15], and other fields [16]. Deep learning is good at mining and learning the deep features of multisource heterogeneous data. By combining it with the recommendation system, the hidden features of user and project attributes can be learned more efficiently [17]. Therefore, more and more researchers apply deep learning to the recommendation system. Although the application of neural network to the recommendation system can effectively improve the recommendation performance, not all feature interactions can contribute to the prediction results. For example, learning user or project features interacting with useless features may introduce noise and thus affect the performance of the recommendation system [18].

The teaching recommendation system algorithm is mainly based on content recommendation and collaborative filtering recommendation. For example, Yao et al. [19] introduced the application of three-dimensional biased weight tensor decomposition in teaching recommendation. Jie-Guang et al. [20] introduced a teaching optimization and research mode of rectangular neighborhood structure. Hou et al. [21] introduced WebGIS course teaching evaluation based on the dynamic adaptive teaching and learning optimization algorithm. Guan et al. [22] introduced the design and implementation of an open platform for dynamic mathematical and digital resources.

In view of the lack of accurate recommendation and selection of courses on online teaching platforms, the main work and innovations of this paper are as follows:(1)An online course recommendation system based on the double-layer attention mechanism is proposed. By introducing the double-layer attention mechanism into the parallel neural network recommendation model, the model’s ability to mine important features is improved.(2)Reset the weight of the preprocessed course text information. The student feature vectors and course feature vectors learned by a multilayer fully connected neural network were input into the second-layer attention mechanism, and the multilayer perceptron was used to parameterize the attention score.

Finally, the curriculum attribute dataset is used to conduct curriculum recommendation experiments, and the accuracy of the proposed method in curriculum recommendation is compared and verified.

The structure of this paper is as follows. Section 2 focuses on the proposed algorithm model in this paper. Section 3 describes the experiment and analysis. Section 4 lists the conclusion.

#### 2. The Proposed Model

A neural collaborative filtering (NCF) model uses a parallel neural network to learn user and project potential feature vectors, respectively [23]. In the prediction layer, the implicit vector is mapped to the predicted value by a multilayer perceptron (MLP). The NCF model uses the MLP to extract higher order feature information to improve the recommendation ability of the model. In the process of feature interaction, not all feature interactions can contribute to the prediction results, and the different effects of items in the historical interaction sequence on the current prediction are ignored. Therefore, this paper introduces the attention mechanism into the neural network to assign personalized weight to the items of historical interaction sequence and improves the model.

On the basis of the NCF recommendation model, attribute information of students and online courses is taken as input data and , respectively. The algorithm model architecture of this paper is shown in Figure 1.

A double-layer attention mechanism is introduced, and the first layer is used to combine with the convolutional neural network (CNN) to build a subnetwork so that the CNN can learn the key content in the online course text. The other layer takes student and online course feature vectors as input data and uses the attention mechanism to assign personalized weight to the students’ history interactive online course. Then, the effect of different online courses on the current prediction preference can be obtained. The recommendation results are grouped and shown to students in the form of the online course group to enhance the order of the recommendation content.

##### 2.1. Learning the Underlying Characteristics

To solve the problem of sparse data in the recommendation system, the attribute information of students and online courses is used to predict scores. After data preprocessing, the attribute information of students and online courses is input into the embedded layer to encode the attribute information. The embedding layer maps the input sparse vector to a dense low-dimensional embedding vector and obtains the embedding representations and of student and online course attributes. At the beginning of the training, the embedding is simply selected randomly. As the training goes on, each embedded vector will be updated to help the neural network perform its task.

The embedding vectors and of students and online courses are input into the parallel multilayer fully connected neural network to learn the potential feature vectors of nontext attributes of students and online courses, respectively:where is the tanh activation function and and are the weight matrix and bias to be learned, respectively.

##### 2.2. Convolutional Neural Network with Attentional Mechanism

The text information of the attributes of online courses includes the title of online courses. In order to enhance the network’s ability to learn the key content in the text, the attention mechanism is combined with the CNN to form a subnetwork to extract text features. The text convolutional neural network is composed of an attention layer, a convolutional layer, a pooling layer, and a full connection layer, as shown in Figure 2.

The attention layer assigns attention weight to the word vector matrix of each online course text to obtain the updated word vector matrix. The word vector matrix is obtained by embedding the text content of online course. *K* is the dimension of word vector, that is, every word is mapped to a K-dimensional vector . *n* is the number of words. represents the word vector matrix of the text information carried by all the online courses browsed by the target student . is the word vector of the *i*th word indicating . The word vector representation of each word in the word vector matrix of target students and the attention score of each word in the word vector matrix *E* of all texts of online courses are calculated:where and are training parameters.

The attention score of is normalized by Softmax function, and the attention weight aiJ corresponding to each word vector is obtained:where ∈ is the attention weight value. The vector splicing operation of the attention weight matrix and original word vector matrix *F* is carried out. The updated network course word vector matrix will be obtained as the input matrix of the convolutional neural network:

In the convolution layer, each neuron slides from the leftmost of the matrix to the right along the direction of the sentence by the convolution kernel . The window size of convolution kernel is set to *m*, and the feature representation of each word in the sentence is obtained after the convolution operation. Feature graphs are formed by activation functions. The *y*-th neuron produces features as shown in the following equation:where is the convolution operation, is the offset term, and is the nonlinear activation function ReLU; the nonlinearity of the convolutional neural network is enhanced through the activation function :

Select maximum pooling to pool the output results of the convolution layer. The feature graph is divided into several rectangular regions, and the maximum value is output for each subregion. Maximum pooling removes unimportant or repetitive features in each subarea for the current task and retains information that can express text features. The pooling result of the *y*-th convolution kernel is shown in the following equation:

The pooled output is input to the full connection layer, multiplied by the weight matrix of the full connection layer, and added with a bias sum. Classification output is obtained after ReLU activation function; the hidden features of online course text information are as follows:where is the weight coefficient of the full connection layer and is the offset term. Assume that the nontext attribute feature vector of online courses can be connected to the text feature vector to obtain the online course feature :

##### 2.3. Prediction Score

Traditional recommendation models typically perform the interaction between the implicit representation of of student characteristics and of online course characteristics to obtain the final predicted grade. Due to the lack of customized optimization of the recommendation task, the equal treatment of all the history online courses will limit the representation ability of the model. The traditional neural network recommendation model ignores that different online courses of student history play different roles in predicting the next online course, so the accuracy is low.

In the prediction layer of the algorithm model in this paper, a neural attention network is used to distinguish the importance of history network courses to overcome the limitations of the traditional neural network recommendation model. Learn the tacit representation of of student characteristics and of online course characteristics as the input of the attention layer. The attention of target students to different online courses has different effects on predicting the next online course. The attention score of the student to the online course is shown in the following equation:where and are the weight matrix and bias term to be learned. The nonlinear relation of the current online course to predict the next online course is obtained by Re activation function. indicates that the student pays more attention to the online course . The online course plays an important role in predicting the next online course. Softmax function was used to normalize the attention score :where is the contribution degree of online course to students’ preference curve and indicates the historical interactive network course set of the student . The weight of is implicitly reassigned to course characteristics as follows:

As the inner product of student bearing and online course bearing is calculated, the predicted score can be obtained as shown in the following equation:

The mean square error (MSE) was used as a loss function to minimize the gap between the real score and the predicted score during the training of the model:where marks students’ real online courses and estimates the scores students have for online courses. The objective function was optimized by using the stochastic gradient descent method to minimize the loss function, and the weight and bias of each layer were optimized by using the backpropagation algorithm. After the neural network training is completed by the algorithms above, the model is used to predict students’ grading of ungraded online courses. Recommendations are made to target students according to the predicted score size. The recommendation results are used for subsequent grouping to realize online course group recommendation.

##### 2.4. Improved TF-IDF Method

TF-IDF is often used for text classification and information retrieval. TF-IDF generally considers only the number of documents and the frequency with which keywords appear in documents. When words have score data, they cannot make full use of score data to calculate TF-IDF values more accurately. The scoring data are introduced into the TF-IDF method to avoid losing words with high scores while evaluating the importance of words. The improved TF-IDF method is shown in the following equation:where is the importance of to student , ranging from 0 to 1; is the score of the file containing the words ; is the score sum of all files; is the total number of files in the database; and is the number of all files. According to the dataset of the online course, scoring data are determined and obtained as . is calculated based on the score data of the history online courses of the designated students in the training set. The improved TF-IDF is used to calculate the TF-IDF value of the types of online courses contained in the recommendation results, and the group recommendation of online courses is realized by obtaining students’ preference for different types.

The first part of equation (15) is term frequency calculation of TF-IDF. The predicted score was used to calculate the score of online courses containing file WI and its proportion in the total file score sum. When the number is large, it indicates that the score of the file containing is higher, which reflects the importance of to some extent. The second part is the inverse document frequency of TF-IDF. The more the number of files contain words in the total number of database files, the more weak the importance of TF will be . is obtained by multiplying the two parts, and the value indicates students’ preference for the word.

Based on the improvement of the TF-IDF algorithm, this paper analyzes the types of online courses in the recommendation results, obtains the importance of different types of online courses to students, and realizes the automatic grouping of recommendation results. The same online courses from the top-*N* recommendation results were placed into the same group. Students’ favorite type of online courses will be recommended first, so that students can quickly find content that matches their interests. The specific implementation steps of the improved TF-IDF algorithm are as follows: Step 1: calculate the word frequency in the specific file , including the word score of the online course and the proportion of the score and sum of all online courses in the students’ browsing history: Step 2: calculate the reverse file frequency index. Divide the number of centralized online courses by the number of online courses containing the word and then take the logarithm of the quotient: Step 3: multiply word frequency and reverse file word frequency to obtain the TF-IDF value of in the file :

The improved TF-IDF uses score data to reflect the proportion of word in the file when calculating word frequency.

##### 2.5. Grouping the Recommended Results

The improved TF-IDF method was used to obtain students’ preference for different types of online courses. Taking online course recommendation as an example, the CourseLens dataset is used to test the algorithm. The word frequency information of online course types is shown in Table 1. First of all, N online courses are recommended to students. Second, the online course information document . Extract the network course types contained in the network course information documents and establish the network course type information documents . Finally, the word frequency statistics of online course type documents are made to obtain the word frequency information of online course type in the recommended results.

When the recommended online course *m*_{s} contains the online course type, the value of *Rm*_{s}, is 1. Otherwise, the value is 0. The improved TF-IDF is used to analyze the word frequency information of online course types and predict students’ preference for different online course types, as shown in the following equation:where is the scores of the student on the online course , indicates whether the online course contains type , *N* is the number of online courses with recommended results, and is students’ preference for different types of online courses.

The types of online courses are arranged in descending order according to their preference degree , and the first *K* types are taken as the group name of the online course group to be recommended, namely, *L* = {*L*_{1}, *L*_{2}... , *L*_{K}}, where *L* represents the set of online course groups to be recommended. Add *D* online courses of the same type to each online course group. The online courses in *L*_{K} are derived from the recommendation results generated by the algorithm model in this paper. The online courses in the online course group are arranged in descending order according to the predicted score. Finally, *K* online course groups are formed, and each online course group contains online course group recommendation of the same type of online course in part *D*.

#### 3. Experiment

##### 3.1. Experimental Environment and Dataset

The experiment was carried out in PyCharm integrated development environment on a 64 bit Windows 10 system. Versions of Python and TensorFlow deep learning frameworks are 3.7.9 and 1.8.0. The CPU of the computer is 3.50 GHz Intel Core i7-7800, and the memory is 32 GB.

The experiment used data from an online learning platform. Through collection and collation, the dataset of 500 students, 700 online courses, and 35,000 scoring records was finally formed. As the collected data have both structured and unstructured data, it is necessary to preprocess the data. First of all, the data are classified, and different processing methods are selected for different classification using group student ID, online course ID, the number of studies, and grade value. As the types of students, grades, and categories of online courses are classified information, they are classified as one class. Then, the values of each field are digitized. Since the name and description information of online courses are both text information and cannot be quantified, they are classified as one class.

There are three stages to realize the recommendation: the first stage is data preprocessing, the second stage is the construction and training of deep neural networks, and the third stage is the generation of the recommendation list.(1)Data preprocessing. Since the data used are not all of digital form, such as the type of students and the category of online courses, these data cannot be directly input into the network; instead, they need to be transformed into vector representation.(2)Building and training the network model. First, a deep neural network is constructed. On this basis, the data processed in the previous step are divided into training sets and test sets. In this experiment, 80% of the datasets are used as training sets and 20% of the datasets are used as test sets.(3)Generating a recommendation list. The trained model was used to predict scores. Finally, the top-N online courses are recommended for students according to the rank of scoring value.

##### 3.2. Evaluation Indicators

There are many evaluation indexes based on the top-N recommendation algorithm. The HR (hit ratio), NDCG (normalized discount cumulative gain), RMSE (root mean square error), and MAE (mean absolute error) will be used in this experiment.

The calculation equation of HR is as follows:where *P* represents the online courses in the test set, indicates that online courses in the test set are also in the top-*N* recommended set, and testset indicates the number of test sets.

The NDCG is widely used in the evaluation tasks of recommendation ranking and information index ranking. It is a kind of evaluation index sensitive to ranking position. The more relevant the ranking is, the bigger the corresponding NDCG will be. First, the DCG is introduced, and its calculation equation is as follows:where is the relevance of the first online course to students. In this experiment, if the online course is a positive sample, the correlation degree is 1, while the negative sample has a correlation degree of 0. If more relevant online courses are ranked later, the overall DCG will be smaller.

Each student’s recommendation is ranked by an ideal value, which is ranked from most relevant to least relevant. This is the minimum IDCG that each student can calculate. Thus, the NDCG can be calculated by the following equation:

The NDCG has a value between 0 and 1, and the larger the value, the better the recommendation effect.

The RMSE and MAE are calculated as follows:where *T* is the number of online courses with scoring records in the test set.

##### 3.3. Experimental Results and Analysis

In order to verify the effectiveness of the algorithm in this paper, MAE function was used for training in the experiment. It can be seen from the Figure 3 that, with the increase of iteration times, the MAE value gradually decreases to 0.69. The algorithm model in this paper, which combines student attributes, online course attributes, and content description information, has more complete data and tends to reduce errors. Therefore, the model in this paper has certain validity (Figure 3).

Next, the algorithm in this paper is analyzed and compared with other algorithms [24, 25]. In order to test the influence of top-*N* number on the recommendation effect, the values of *N* were set as 20, 40, 60, 80, and 100, respectively. The NDCG and HR were analyzed, respectively. The experimental results are shown in Figures 4 and 5. It can be seen that, with the increase of top-N number, NDCG and HR indicators of these three algorithm models are constantly improved. The performance indexes of the proposed algorithm are all higher than those of other algorithm models, indicating that the proposed algorithm model has a better recommendation effect.

The comparison of the RMSE and MAE between online course group recommendation and other recommendation algorithms is shown in Figure 6. The RMSE and MAR values of the proposed algorithm are higher than those of other algorithms. This further indicates that the proposed algorithm can improve the accuracy of the recommendation algorithm by extracting implied features of students and online courses after referring to the attention mechanism.

#### 4. Conclusion

This paper proposes a kind of online course recommendation system based on the double-layer attention mechanism, aiming to solve the problem of lack of precise guided course selection in the existing network platform. The feature extraction ability of the convolutional neural network is improved by introducing the two-level attention mechanism into the convolutional neural network. Different preference weights are assigned to curriculum features to achieve a recommendation method more in line with students’ preferences. In course recommendation to target students, score data and network course type data are combined to complete the grouping of recommendation results. The experimental results show that the system achieves better performance in the NDCG, HR, RMSE, and MAE, which provides necessary theoretical support for the accurate selection of course guidance on the network platform. The analysis and optimization of algorithm efficiency is the direction of further research.

#### Data Availability

The labeled datasets used to support the findings of this study are available from the author upon request.

#### Conflicts of Interest

The author declares that there are no conflicts of interest.

#### Acknowledgments

This work was supported by the study on the Practice of Improving Information Technology Application Ability of College Teachers in the Context of Education Informatization 2.0 (no. 20190202047069) and in part by the study on the Measurement Model of Teaching Proximity in Online Courses in Higher Education (no. 19GZQN26).