#### Abstract

Overfitting in a deep neural network leads to low recommendation precision and high loss. To mitigate these issues in a deep neural network-based recommendation algorithm, we propose a recommendation algorithm, LG-DropEdge, joint light graph convolutional network, and the DropEdge. First, to reduce the cost of data storage and calculation, we initialize user and item embedding in the embedding layer of the algorithm. Then, to obtain high-order interaction relationships to optimize the embedding representation, we enrich the embedding by injecting high-order connectivity relationships in the convolutional layer. In the training phase, DropEdge is used to randomly discard connected relationships (interaction edges) to prevent overfitting. Finally, to reasonably aggregate the embedding results learned on all layers, the weighted average is expressed as the final embedding, so that users can make preferences in the item. We conduct experiments on three public datasets, using two performance indicators; namely, recall and NDCG, are used for evaluation. For the Gowalla dataset, compared with the optimal baseline method, recall@20 and ndcg@20 increased by 2.53% and 2.39%, respectively. For the Yelp2018 dataset, recall@20 and ndcg@20 increased by 6.17% and 5.58%, respectively. For the Amazon-book dataset, recall@20 and ndcg@20 increased by 4.82% and 4.67%, respectively. The results show that LG-DropEdge can not only reduce the degree of neural network overfitting but also improve the recommended results’ precision.

#### 1. Introduction

Personalized recommendation is a common recommendation method that has been widely used in social media, advertising, e-commerce [1], and other online services. It effectively alleviates the difficulties involved in a user obtaining personalized content due to the explosive growth of information. Its goal is to estimate the likelihood of users to adopt a product based on historical interaction behaviors, such as purchases and clicks. Thus, many studies have focused on recommendation algorithms that are in line with user needs and are practical.

Collaborative filtering (CF) is a method used to build a large-scale recommendation system. Its advantages are strong interpretability and high maturity. The general idea of the algorithm is to predict the items that users may be interested in by analyzing the interaction between users and items. At a high level, the similarity measure between users is based on user rating history, so that ratings from like-minded users can be used to predict the ratings of interested users; it can also be based on the ratings of users who were interested in the past. To realize this idea, a common approach is to reconstruct historical interactions by parameterizing users and items and then predict user preferences based on parameters [2]. Traditional CF recommendation algorithms can be grouped into two categories: neighborhood-based CF methods [3, 4] and matrix factorization (MF) [5–7] methods. MF have been studied more extensively. It is usually assumed that the scoring matrix can be approximated by two lower triangular matrices. On the basis of matrix factorization, probabilistic MF (PMF) optimizes the maximum likelihood by minimizing the mean square error between the observed items and the reconstructed level [5]. Biased MF improves PMF by merging user and item specific deviations and global deviations [6]. In practical applications, the CF method has some problems, such as a learning process not being displayed, the high-level interaction information between the user and the item not being considered, and the implicit relationship between the user and the item being ignored.

In recent years, the deep-learning algorithm represented by Convolutional Neural Networks (CNNs) has made great progress in many aspects [8, 9], but its design mostly uses regular Euclidean data (which can be expressed in the form of a sequence or a two-dimensional grid), such as image, voice, and natural language. CNNs are suitable for Euclidean data [10] but have limitations in networks with non-Euclidean structures. This is an issue because not everything can be represented as a sequence or two-dimensional grid, such as social networks or chemical molecules. As these data can be regarded as special cases of graph-structured data, researchers naturally think of generalizing CNN to graphs.

#### 2. Related Research

##### 2.1. Graph Convolutional Network

A graph convolutional network (GCN) [11] can overcome the problems of CNNs only being applicable to Euclidean data and can capture the characteristics of a network structure. Development has been rapid and research in this direction is generally divided into two categories: methods based on spectral decomposition and methods based on spatial structure. Spectral decomposition methods mainly deal with the spectral domain of the graph. A spectral network [12] defines the convolution operation in the Fourier domain by calculating the feature decomposition of the Laplacian matrix of the graph, but this convolution operation will cause the convolution kernel not to meet the locality. Henaff et al. proposed introducing a parameter term with smooth coefficients to solve the locality problem of the convolution kernel [13]. Defferrard et al. proposed ChebNet, using K-order convolution to define the graph; the convolution can avoid the calculation of redundant Laplacian matrix eigenvectors [14]. Kipf and Welling proposed reducing the convolution operation to first order [15], which greatly reduces the calculation of graph convolution. As a simplification of spectral decomposition methods, the GCN was formally proposed. To learn the implicit relationship between different nodes, Li et al. proposed a residual Laplacian that an adaptive graph convolution network should be added to the original graph [16]. Methods based on spatial structure mainly deal with graphs of different structures and directly define the convolution operation on the graph. GraphSAGE (Graph Sample and AggreGatE) is a method to generate the embedding vector of the target vertex by learning a function that aggregates the representation of neighbor nodes and calculates the node representation inductively [17]. Unlike earlier methods, the graph attention network (GAT) innovatively uses a self-attention mechanism to provide different weights for the heterogeneity of different nodes [18]. Based on GAT, the heterogeneous graph attention network (HAT) refined two attention mechanisms, namely, node-level attention and semantic-level attention [19].

Due to the powerful expressive ability of graphs, graph-based recommendation algorithms have become one of the most popular research methods. The goal is to reorganize the interactive data into a user-item bipartite graph and use the high-level connectivity between users and items to enrich its representation. PinSage [20] uses local convolution to mark the nodes of the graph-structured data and uses multiple convolution modules to aggregate the local neighborhood features of the nodes to generate node embeddings. Graph Convolutional Matrix Completion (GC-MC) [21] applies a GCN to the user-item graph but only uses the signal of the first-order neighbor. Through random browsing on the graph, Hop-Rec [22] combines matrix decomposition and a graph structure from the neighborhood of each user to obtain high-level information from items. Neural Graph Collaborative Filtering (NGCF) proposed by Wang et al. [23] encodes collaborative signals hidden in user-item interactions and spreads and embeds them in bipartite graphs to achieve high-level neighborhood aggregation. However, its direct inheritance of GCN makes the design quite substantial with high algorithm complexity and difficult algorithm training. Based on the improvement of the traditional GCN, a series of new algorithms have been proposed. Simple Graph Convolution (SGC) [24] eliminates the GCN layer and employs a nonlinear relation between time and a linear algorithm to reduce the complexity of the algorithm.

##### 2.2. LightGCN Algorithm

The previous works that combine GCNs with recommendation mostly inherit GCN and increase its generalization ability. Based on an in-depth analysis of GCN, from the perspective of simplifying GCN design to make it more concise and more suitable for recommendation algorithms, a new algorithm was proposed: light graph convolutional network (LightGCN) [25]. It uses only the GCN as the foundation for neighborhood aggregation. It has been verified via thorough ablation experiments on the special transformation and nonlinear activation of NGCF inherited from the GCN. The study concluded that the two operations inherited from GCN, nonlinear activation and feature transformation, do not make a positive contribution to NGCF. Furthermore, removing them can significantly improve recommendation precision, showing that certain operations in the GCN bring no benefit to the recommendation task and reduce the effectiveness of the algorithm. Therefore, in LightGCN, each user (or item) is first associated with ID embedding, which is then propagated on the user-item interaction graph to enrich its representation. Finally, the embeddings and weights learned in different layers are combined for recommendation prediction. This not only makes the entire algorithm concise in structure, but also gives a great performance improvement compared to other recommendation methods.

##### 2.3. DropEdge

DropEdge [26] involves randomly removing a certain number of edges from the input graph in each training phase to address overfitting and oversmoothing. Overfitting occurs due to the use of a parameter algorithm to fit the distribution of limited training data. The learned algorithm fits the training data well but is not suitable for the test data. Overfitting weakens the generalization ability of small dataset. Oversmoothing isolates the output representation from the input features as the network depth increases, thus hindering algorithm training. DropEdge can be regarded as a data enhancement technique that increases the randomness and diversity of input data, thereby better preventing overfitting. It can also be regarded as reducing message passing. Losing some edges makes the node connections sparser, which avoids oversmoothing to a certain extent when the GCN is deep.

In addition, DropEdge is different from Dropout [27] and DropNode [28]. Dropout disrupts the feature matrix by randomly setting the feature dimension to zero, but because it does not change the adjacency matrix, it does not have an obvious impact on the overfitting problem. DropNode samples the subgraph and is used for small batch processing. Its principle is that discarding certain nodes can be understood as a special kind of edge discarding. DropNode is for nodes, and edge discarding is achieved indirectly by discarding nodes, whereas DropEdge is edge-oriented and can retain the characteristics of all nodes. That is, with DropEdge, the node only loses the interaction with a certain node, but this does not affect the interaction between the node and other nodes, providing greater flexibility and wider applicability.

Therefore, we propose a joint LightGCN and DropEdge recommendation algorithm named LG-DropEdge. This algorithm is based on the concept of the LightGCN algorithm and integrates the DropEdge which slightly improves the algorithm prediction.

The main contributions of this paper are as follows: (1) we proposed new hybrid recommendation algorithm (2) adding DropEdge to the GCN to enrich input and reduce message passing and (3) changing the final representation of LightGCN from the original average of each layer to a weighted average. Experiments on multiple public datasets verified its advantages and performance.

#### 3. Methods

This section describes the proposed LG-DropEdge. The algorithm can be divided into two parts: a light graph convolution algorithm, which is the basic core part of the algorithm and DropEdge to mitigate overfitting and oversmoothing problems caused by deep networks.

##### 3.1. LG-DropEdge Algorithm

The recommendation algorithm based on the GCN uses the topological structure of the graph to spread and aggregate the information of neighboring nodes and learn the embedding of nodes. The algorithm structure is shown in Figure 1. It is usually divided into three layers: an embedding layer, a convolution layer, and a prediction layer. LightGCN is based on NGCF [23]. Ablation experiments show that the two operations inherited from GCN feature transformation and nonlinear activation do not bring any benefits but negatively impact algorithm training by increasing difficulty. Removing them can significantly improve precision. This reflects that adding useless operations to the target task in GCN does not bring any benefits and reduces effectiveness.

###### 3.1.1. Embedding Layer

In the LG-DropEdge, the main tasks of the embedding layer are to express the entities (users and items) and relationships in the user-item interaction diagram as low-dimensional vectors and to retain all the information of the interaction diagram, which can reduce data storage and calculation costs and filter out some noise data. Following the mainstream recommendation algorithm [23], the IDs of users and items are mapped to vectors using one-hot encoding, denoted by and , where is the embedding size. Here, and are the initial vectors of user embedding and item embedding, respectively. The number of users is and the number of items is .

###### 3.1.2. Convolutional Layer

After obtaining the embedding representation of the user (item) node, based on the graph neural network message passing method rule [23], the collaborative signal is obtained on the interactive graph structure and the embeddings of the user and item are optimized. This mainly involves the construction of signals and the aggregation (update) of node embedding. The advantage lies in the embedded representation that can be displayed to associate users and items with high-level collaboration information. This section shows the embedding learning process of the first-order signal and the higher-order signal extended from the first-order signal.

*(1) First-Order Signal*. The consumer interaction between the user and the item can be used as a characteristic of the user and can be regarded as a collaborative signal of two users.

Structure of the signal: for users and items that have an interactive relationship, the resulting signal is defined aswhere is the propagated signal, and are embedded inputs, and is the attenuation coefficient of each propagation on the control . Finally, is the signal encoding function, expressed aswhere is set to the graph Laplacian norm and and denote the first-order neighbor sets of the user and the item , respectively.

Aggregation of node embedding: this is used to enrich the form of embedding by summarizing the signals of neighbors near the user node . The aggregate function is defined aswhere represents the embedded representation of the user obtained after the first embedding and propagation. The self-connection of is not considered here. Similarly, the representation of the item obtained through embedding and propagation can be obtained.

*(2) High-Order Signals*. Through the improvement of first-order signals, multiple layers of embedding can be stacked to represent high-order collaborative signals. This high-level signal enables better interpretability in estimating the correlation between users and items.

Signal structure: enrich the signal by stacking multiple layers of embedding, defined as follows:

Node embedding aggregation: aggregate multilayer signals, which can receive signals propagated from layer neighbors, are defined as follows:where is the representation generated from the previous signal transfer step, which is used to store collaborative signals from layer neighbours. Similarly, the layer representation of the item can be obtained.

###### 3.1.3. Prediction Layer

After spreading the layer, the representation of the user , namely, , can be obtained. The embeddings obtained at each layer are further combined to form the final representation of the user (item):where represents the importance of layer embedding in forming the final embedding, which is equivalent to an attention mechanism. This is because, when the number of layers increases, the weight is on a downward trend. The final representation is changed from the original average of each layer to a weighted average. The deeper the layer, the greater the weight, which emphasizes the importance of the deep signal. To prevent the algorithm becoming too complex, is set to .

Finally, the inner product is used to estimate the user’s preference for the item:

It can be used as the ranking score generated by the recommendation.

##### 3.2. DropEdge Module

The main task of DropEdge [26] is to randomly lose a certain number of edges from the input graph at each training phase, which can be understood as data enrichment. This can increase the randomness and diversity of the input data, which can mitigate overfitting. DropEdge’s approach can also be understood as the simplification of data transmission. Losing some edges makes node connections sparser, which can effectively avoid oversmoothing.

To apply the DropEdge, (5) is transformed into a matrix form, as follows:where is the representation of users and items obtained after embedding in the propagation layer and is the embedding size. The initial signal is set to ; that is, and . Finally, is the Laplacian matrix (symmetrical normalization) of the user-item interaction graph, which is defined aswhere is the adjacency matrix, is the pair angle matrix, is the number of nonzero entries in the *l*-th row vector of the adjacency matrix , and is the user-item interaction matrix. Finally, the final embedding matrix used for algorithm prediction is

In each training phase, DropEdge randomly selects a certain percentage of edges of the input graph. In other words, it randomly sets nonzero elements in the adjacency matrix to zero, where is the total number of edges, and is the loss rate. The obtained adjacency matrix is expressed aswhere is expanded from a random subset of the original edge size of . Then, the matrix is normalized and expressed as , which is used to replace in the convolution operation in (9).

##### 3.3. Algorithm Training

Reference algorithm [2], using paired Bayesian Personalized Ranking (BPR) loss [29], considers the relative order between observed and unobserved user-item interactions and encourages the prediction of observed items to be higher than that of unobserved items; the interactions with corresponding observed items are more reflective of user preferences.

The loss function of the algorithm iswhere is the paired training data, is the observed interaction, is the unobserved interaction, is a sigmoid function, controls the intensity of regularization, and the training parameter is only the embedding of the 0th layer; that is, . A small batch of the Adam [30] optimizer is used to predict the algorithm and update the algorithm parameters.

#### 4. Experiments

To verify the recommended performance of the proposed LG-DropEdge, this study used the PyTorch deep-learning framework. The operating system used in the experiment was Windows 10, the graphics card was a Nvidia Titan V, and the CPU was an i7-8700K. The Python version was 3.6. Experiments and analysis were conducted using the Pycharm2020 development tool and PyTorch deep-learning framework.

To compare the performance of the proposed algorithm with other algorithms and verify the effectiveness of the algorithm’s own modules, we designed three sets of experiments: (Q1) Compared with the most advanced LightGCN, how is the performance of LG-DropEdge compared? (Q2) How do the settings of different improved modules affect the performance of LG-DropEdge? (Q3) How do the hyperparameter settings (such as edge loss rate) impact the effectiveness of LG-DropEdge?

##### 4.1. Experimental Dataset

To evaluate the effectiveness of LG-DropEdge, we conducted experiments on three public datasets: Gowalla, Yelp2018, and Amazon-book. The Gowalla dataset uses the login dataset from Gowalla [31], where users share their location by logging in. Yelp2018 comes from the 2018 version of the Yelp Challenge, in which companies such as hotels and coffee shops were used as items. Amazon-book [32] comes from book data in Amazon reviews. There are differences in data entities, numbers of interactions, and sparsity, which can meet the needs of different data characteristics as required by the algorithm. Table 1 summarizes the statistics of the three datasets.

Inspired by mainstream recommendation algorithms [23, 25], for each dataset, 80% of each user’s historical interactions were randomly selected to form the training set, and the rest were used as the test set. From the training set, 10% of the interactions were randomly selected as the validation set to adjust the hyperparameters. Each observed user-item interaction was treated as a positive instance. A negative sampling strategy was then used to pair it with a negative item that the user had not previously consumed.

##### 4.2. Experimental Setup

###### 4.2.1. Evaluation Index

The performance assessment of the recommendation algorithm has many aspects, divided into a quantitative calculation and qualitative description. This section describes the performance of the algorithm from the perspective of prediction precision. When calculating this indicator, an offline dataset is required that contains historical user behavior data. The dataset is then divided into a training set and a test set, and finally the user’s behavior on the test set is predicted by establishing a user’s behavior and interest algorithm on the training set. The coincidence degree of the predicted behavior and the actual behavior on the test set are calculated as the prediction precision. For users’ Top-N recommendations, algorithm [14, 29] used evaluation indicators, such as precision, recall, and Normalized Discounted Cumulative Gain (NDCG) for *N* = {20, 100}.

Let denote the recommendation list given to the user through training and let denote the behavior list on the test set.

The precision represents the proportion of the number of samples that are correctly predicted to the total number of samples and is defined as

The recall of the recommended result represents the probability that the sample is correctly predicted to occupy the actual sample, which is defined as

The NDCG of the recommended result is defined aswhere is the cumulative gain of loss, is the maximum value of under ideal conditions, and indicates the user’s rating for the *i*-th item. Finally, indicates that the results are sorted in descending order of relevance, where the set consisting of the previous results is adopted; that is, according to the most, sort the results in an optimal way.

###### 4.2.2. Baseline

To demonstrate the effectiveness of the proposed algorithm, we compared it with the following methods: NeuMF [2]: this method is an advanced neural collaborative filtering algorithm that uses multiple hidden layers above elements and the concatenation of user and item embeddings to capture their nonlinear characteristic interactions. Hop-Rec [22]: this method combines matrix decomposition and graphs through random walks on the graph, combined with the degree of the vertices, sampling different positive samples with a certain probability, and assigning attenuation coefficients to the ranking pairs obtained in different orders. NGCF [23]: this method explicitly introduces the collaborative signal into the collaborative filtering algorithm and achieves this by using the high-level connectivity in the user-item interaction graph. LightGCN [25]: based on NGCF, this method removes feature changes and nonlinear activation through ablation experiments and adds a weight factor to the final aggregation, which is greatly improved. It is the latest direction of graph convolution.

###### 4.2.3. Parameter Settings

With reference to NGCF [23] and LightGCN [25], the LG-DropEdge algorithm is implemented in PyTorch. Considering the settings of the comparison experiment, choose the same hyperparametric settings as NGCF and LightGCN, etc. To ensure the accuracy and fairness of the comparison results, the embedding size of all algorithms is set to 64; the learning rate is set to ; the number of layers is set to 3; the batch size is set to 2048; the regularization coefficient is set to ; considering the convergence of loss, the number of training is set to 1500. Grid search is performed on the hyperparameters and the edge loss rate is determined in .

##### 4.3. Performance Comparison (Q1)

###### 4.3.1. Performance Comparison with LightGCN

Compared with the LightGCN algorithm, Table 2 records the algorithm performance for different datasets and different indicators and shows the percentage improvement of each indicator. It reveals the clear improvement made by the proposed LG-DropEdge.

In the three sets of comparative experiments, LG-DropEdge performs better than LightGCN. In the three datasets, the algorithm performance on the Yelp2018 dataset has been greatly improved; the precision index has increased by 6.04% on average. The main reason for this is that the Yelp2018 datasets are sparser than the other two datasets. A dataset with high sparseness has a large number of user interaction items, and the data characteristics can be better retained after multiple algorithm training phases. In addition, the average precision of the three datasets increased by 4.89%, recall increased by 4.51%, ndcg increased by 4.21%, and overall performance improved.

###### 4.3.2. Overall Comparison

To verify the advantages of the LG-DropEdge recommendation algorithm in terms of precision, we implemented performance comparisons with other classic algorithms (NeuMF [2], Hop-Rec [22], NGCF [23], and LightGCN [25]). The experimental results are shown in Table 3.

The following can be seen from Table 3:(1)NeuMF performs better than Hop-Rec on the Amazon-book dataset. Since Hop-Rec is implemented by combining MF and graphs, its performance largely depends on the random walk algorithm, and the effect is not very obvious because it does not make full use of hight-order connectivity of graphs.(2)Since Hop-Rec is implemented by combining MF and graphs, it does not make full use of high-order connectivity. Its performance largely depends on the random walk, and the effect is not very obvious.(3)NGCF yields a significantly better performance than NeuMF and Hop-Rec because it explicitly introduces the collaborative signal into the system filtering algorithm and spreads it on the interactive graph. However, its algorithm directly inherits GCN, which leads to increased algorithm complexity and training time.(4)LightGCN, as a simplification of NGCF, yields a powerful performance, but its algorithm does not solve the problem of deep-network overfitting. Further, it uses relatively simple aggregation functions, which limit the effect of improvement.(5)LG-DropEdge yields the best performance on all datasets, particularly Yelp2018, which showed an increase of more than 5%. Thus, the LG-DropEdge algorithm can be used to improve the precision of a recommendation system.

##### 4.4. Ablation Experiment (Q2)

To demonstrate the feasibility of LG-DropEdge, the algorithm is subjected to ablation experiments. To verify the effectiveness of the combined modules, we performed several sets of experiments on the three datasets: removing the improved aggregation function and keeping DropEdge, named LightGCN + DropEdge; removing DropEdge and keeping the improved aggregation function, named LightGCN + *f*; removing both DropEdge and the improved aggregation function, which is the original LightGCN; and the proposed method, LG-DropEdge.

The results are as shown in Table 4.

The following findings can be made from Table 4:(1)Compared with LightGCN, precision, recall, and ndcg increased by 2.33%, 1.48%, and 1.23%, respectively, in the LightGCN + *f* algorithm(2)Compared with LightGCN, precision, recall, and ndcg decreased by 0.72%, 0.49%, and 0.19% respectively, in the LightGCN + DropEdge algorithm(3)Compared with LightGCN, precision, recall, and ndcg increased by 3.04%, 2.53%, and 2.39%, respectively, in the LG-DropEdge algorithm

The results are as shown in Table 5.

The following findings can be made from Table 5:(1)Compared with LightGCN, precision, recall, and ndcg increased by 5.65%, 5.38%, and 5.00%, respectively, in the LightGCN + *f* algorithm(2)Compared with LightGCN, precision, recall, and ndcg decreased by 2.12%, 2.58%, and 2.69%, respectively, in the LightGCN + DropEdge algorithm(3)Compared with LightGCN, precision, recall, and ndcg increased by 6.36%, 6.17%, and 5.58%, respectively, in the LG-DropEdge algorithm

The results are as shown in Table 6.

The following Findings can be made from the Table 6:(1)Compared with LightGCN, precision, recall, and ndcg increased by 2.92%, 1.69%, and 1.56%, respectively, in the LightGCN + *f* algorithm(2)Compared with LightGCN, precision, recall, and ndcg increased by 2.34%, 2.65%, and 2.49%, respectively, in the LightGCN + DropEdge algorithm(3)Compared with LightGCN, precision, recall, and ndcg increased by 5.26%, 4.82%, and 4.67%, respectively, in the LG-DropEdge algorithm

Combining Tables 4–6, the following can be obtained:(1)Compared with LightGCN, LightGCN + *f* algorithm has different degrees of improvement in the three datasets of precision, recall, and ndcg(2)Compared with LightGCN, the LightGCN + DropEdge algorithm has decreased in the precision, recall, and ndcg indicators of the Gowalla and Yelp2018 datasets, but there has been a small increase in the Amazon-book dataset. It can be seen that adding DropEdge has less impact on the performance of less sparse datasets(3)Compared with LightGCN, LG-DropEdge has a greater improvement over LightGCN-f on the three indicators of precision, recall, and ndcg in three datasets, which fully demonstrates that modifying the aggregation function and adding DropEdge make the algorithm more accurate

To explain the effectiveness of the DropEdge, we compared the loss tested under the three datasets at the same ndcg level. The performance on the test set measures the true performance of the algorithm. Under the same test set index (ndcg@20), the training performance was the same, and the train_loss produced by adding DropEdge with the training set was higher, indicating that it is at the same level and there is no mitigation of overfitting. Figure 2 compares the train_loss for 23 sets of data in Gowalla dataset, Figure 3 compares the train_loss for 55 sets of data in Yelp2018 dataset, and Figure 4 compares the train_loss for 41 sets of data in Amazon-book dataset (ndcg is arranged in increasing order), showing that it has increased to different degrees. It can be seen that adding DropEdge is effectively mitigating overfitting. Furthermore, as ndcg increases, LG-DropEdge performs more smoothly in terms of loss, in contrast to the irregular fluctuations of LightGCN. Therefore, the algorithm proposed in this paper has many advantages in mitigating overfitting and oversmoothing problems.

##### 4.5. Hyperparameter Experiment (Q3)

To verify the degree of influence of the edge loss rate on the algorithm, the edge loss rate is determined in (using three datasets), as in [26].

Figure 5 shows the performance for precision, recall, train_time, and ndcg under different edge loss rates. As seen in Figure 5(a), starting from 0.0, precision and recall maintain the same upward trend as the edge loss rate increases. The first peak is reached at 0.4, after which the two indicators decrease steadily and slightly between 0.4 and 0.5, increase significantly between 0.5 and 0.6, and achieve their best performance at 0.6. After 0.6, they begin to decline sharply. The precision index is at its worst value at 0.8. Meanwhile, as seen in Figure 5(b), the ndcg index maintains a steady increase from 0.0, achieves maximum value at 0.6, after which it drops sharply, and reaches its worst value at 0.8. The train_time index starts at the lowest value at 0.0 and then increases rapidly to 0.2, before declining slowly. Therefore, the overall trends for precision, recall, and ndcg are increasing from the start 0.0, achieving a maximum at 0.6, and thereafter declining rapidly. Time is expressed as the average train_time for an epoch. It is obvious that when DropEdge is increased, the train_time increases significantly. Undoubtedly, it increases the difficulty of training. From Figure 5, when the train_time is relatively short, the precision index performs the best and the edge loss rate of DropEdge is finally determined to be 0.6.

Figure 6 shows the performance for precision, recall, train_time, and ndcg under different edge loss rates. As can be seen from the figure, in Figure 6(a), the precision and recall both increase between 0.0 and 0.2, and they begin to decline between 0.2 and 0.4, increase between 0.5 and 0.6, and then continue to decline. The only trend difference is between 0.4 and 0.5, precision increases, and recall decreases, but the overall trend is the same, first increasing to the highest point () and then slightly lowering, then slightly increasing, and finally falling to the lowest. In Figure 6(b), the ndcg indicator increases between 0.0 and 0.2, reaches the highest point at that time, drops to the lowest point between 0.2 and 0.4, and then increases sharply between 0.4 and 0.5; it increases slowly between 0.5 and 0.6 and begins to decline after 0.6; while train_time indicator starts to decline after rising from 0.0 to 0.2 to the highest point. Combined with the two subgraphs, all three accuracy indicators reach the best state at that time (), then begin to decline, then rise and reach the second peak at , and then begin to decline. According to the rule of relatively little train_time and priority in accuracy index performance, the edge loss rate of DropEdge is finally determined to be 0.2.

Figure 7 shows the performance for precision, recall, train_time, and ndcg under different edge loss rates. It can be seen from the figure that, in Figure 7(a), the two indicators generally show a continuous increase trend. Only precision has a small decline between 0.4 and 0.5, both of which reach the maximum value at ; in Figure 7(b), ndcg also shows an increase trend, reaching the maximum value at , while train_time at reaches the maximum value and then continues to decline. The two indicators have an intersection between 0.5 and 0.6. After the intersection, they both develop in a positive direction and reach the optimal state of the algorithm at , especially after showing a rapid positive growth, so the edge loss rate of DropEdge is finally determined to be 0.8.

The purpose of this section is to fully verify the improvement of the performance of the algorithm after adding the DropEdge module. It can be seen that the final boundary edge loss rate for different datasets is different. It can be seen that, in machine learning, the algorithm’s adaptability to different datasets is different. Based on the results of the above three datasets, there is reason to believe that the performance of arithmetic has been greatly improved by adding the DropEdge module.

#### 5. Conclusion

This paper proposed a joint light graph convolutional network and DropEdge recommendation algorithm (LG-DropEdge). The DropEdge was developed based on the LightGCN framework, which improves the final multilayer fusion aggregation function, mitigates the overfitting problem caused by the deep network algorithm, and improves the precision, recall, and ndcg, and enhances the interpretability of the recommendation algorithm. The proposed algorithm has improved the recommendation accuracy in the offline experiments of the three datasets, but it has not been verified in practical application. In the later research, we will continue to apply other public datasets to our algorithm and gradually apply them to the actual recommendation system to improve the generalization of the algorithm. The existing recommendation algorithm based on graph convolution network generally controls the number of convolution layers at 3 layers considering the attenuation degree of propagation factor, while DropEdge fully demonstrates that its performance is generally in deeper convolutional networks. This paper sets the number of convolution layers to be 3 layers considering the sparseness of datasets and the needs of contrast algorithms. Therefore, future research is to select more sparse datasets for comparative experiments and further analyse the relationship between the number of convolutional layers and the recommendation accuracy.

#### Data Availability

The data used to support the findings of the study are included in the supplementary files.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Acknowledgments

This paper was supported by Young Scientists’ Fund of National Natural Science Foundation of China and Natural Science Foundation of Liaoning Province.

#### Supplementary Materials

gowalla-test/train.txt;yelp-test/train.txt;amazon-test/train.txt includes data used to support the findings of this study. run-gowalla/yelp2018/amazon.txt includes data values given in Tables 2–6 and figures, NGCF and LightGCN results, the datasets needed for the experiment, and part of the log files of the experiment. The results of NeuMF and Hop-Rec in Table 3 are quoted from Xiang Wang, Xiangnan he, Meng Wang, Fuli Feng, and Tat Seng Chua, “neural graph collaborative filtering,” in SIGIR, pp. 165-174. The code data used to support the findings of this study have been deposited in the [Baidu Netdisk] repository (link: https://pan.baidu.com/s/1JUFyBpoc3RPoKkPsQXvXkQ; extraction code: gaz1).* (Supplementary Materials)*