Abstract

The recommendation algorithm can break the restriction of the topological structure of social networks, enhance the communication power of information (positive or negative) on social networks, and guide the information transmission way of the news in social networks to a certain extent. In order to solve the problem of data sparsity in news recommendation for social networks, this paper proposes a deep learning-based recommendation algorithm in social network (DLRASN). First, the algorithm is used to process behavioral data in a serializable way when users in the same social network browse information. Then, global variables are introduced to optimize the encoding way of the central sequence of Skip-gram, in which way online users’ browsing behavior habits can be learned. Finally, the information that the target users’ have interests in can be calculated by the similarity formula and the information is recommended in social networks. Experimental results show that the proposed algorithm can improve the recommendation accuracy.

1. Introduction

At present, the social network services composed of online information flow not only have a huge number of users but also accumulate a large amount of information data due to the active online behaviors of users. For example, the well-known domestic products, such as NetEase News, Tencent News, and Headlines Today, are counted in billions of monthly active users. Monthly active users of the well-known foreign online video company, YouTube, exceeded 2 billion in 2019, and the number of videos on the site reached millions. According to a report of 2012 by the International Data Group (IDC), by 2020, the total global data are expected to be 22 times those of 2011, reaching 35.2 ZB [1]. In these social networks composed of information flow services, information is usually disseminated probabilistically according to the topology structure of the social network. However, these huge information data make the spread of information in the social network become congested, and a large amount of information cannot be browsed. The direct consequence to users is information overload. The recommendation algorithm can not only break the limitations of the traditional social network topology and enhance the spread of information in the social network but also improve the efficiency of obtaining information for multiple users and solve the problem of information overload. Therefore, personalized recommendation technology has become an important topic of common concern in academia and industry today.

The core part of the recommendation system is the recommendation algorithm. After years of research and development, recommendation algorithms are mainly divided into collaborative filtering algorithms and content-based algorithms. The collaborative filtering-based algorithm mainly does recommendation for users according to their past history and ratings, while the content-based algorithm mainly depends on users’ preferences and content information. There are some shortcomings in both algorithms. For example, when modeling users’ behavior data, the real behavior data matrix tends to be very sparse, resulting in poor accuracy of prediction. In order to alleviate the sparsity of the behavior data matrix and improve the recommendation accuracy, the traditional method is used to improve the basic matrix decomposition. With the development of the research, the natural language processing model is used as a feature extraction method in the recommendation model. We have seen embeddings being leveraged for various types of recommendations on the Web [24], including item recommendation [5], advertising recommendation [6], movie recommendations [7, 8], and music recommendations [9]. Finally, similar extensions of embedding approaches have been proposed for social network analysis, where random walks on graphs can be used to learn embeddings of nodes in graph structure [10, 11].

In short, the users’ browsing data used for online information flow recommendation often have a strong sparsity. Besides, most traditional models only focus on the learning of shallow features of the interaction data, ignoring the features of other browsing habits, which makes the ability to express features inadequate, resulting in poor recommendation results. In response to the above problems, it is very important to find a new method to model user interaction data for information recommendation in social network services. In this paper, the users’ public data of news browsing in Caixin.com are taken as the research object. On the basis of the traditional content-based recommendation algorithm and the idea of word embedding, this paper proposes a deep learning-based recommendation algorithm in social network (DLRASN). First of all, the Skip-gram algorithm in Word2Vec is introduced into the information recommendation field in social networks. Each successive sequence of browsing information is treated as a sentence, and each browsing action is treated as a word in the sentence. Secondly, embedding operation is taken for the serialized user browsing data. While the intermediate sequence predicts the context sequence, the project click feature is introduced as a global variable to form an impact factor for the final result. Finally, a Top-N data recommendation set is formed. The experimental results show that the proposed method optimizes the recommendation effect to a certain extent and expands the application areas of embedded model.

2.1. Content-Based Recommendation Algorithm

The basic principle of content-based recommendation algorithm is to obtain user’s interest preferences according to his historical behavior and then recommend similar items. The data to be studied include item information (such as text description, labels, user comments, and manually labeled information), user information (such as age, gender, preference, region, and income), and interaction behavior (such as commenting, collecting, giving a like, watching, browsing, clicking, adding to cart, and purchasing). These data will be used to extract features thereafter by constructing user interest model, and thus it will be transformed into measurable attributes, such as text represented by vectors, text types, and release time. At last, some methods can be used to calculate the similarity for recommendation [12].

The content-based recommendation algorithm generally depends on the user’s own behavior and the item’s own attributes to provide recommendations and. It focuses on analyzing extracted features and does not pay attention to other users’ behavior. Once features of two items are found similar, the algorithm will mark them as the same category. Thus, the algorithm has a good effect on the recommendation for similar items. For example, after a user saw the love movie “crazy call,” the content-based recommendation algorithm may recommend the movie “love saint” for him because the two movies have the similar characteristics. Melucci [13] combined the vector space model (VSM) with TF-IDF algorithm and used the model to calculate the news text similarity. Blei et al. [14] used the correlation between corresponding topics in the context to establish the LDA model.

Though the content-based recommendation algorithm has obtained a good result, the sparsity existing in user behavior data will affect the accuracy of recommendation to a certain extent. In order to address this shortcoming, the concept of embedding is introduced in the field of Web Search, e-commerce, and marketplace. The researchers find that one can train word embeddings by treating a sequence of words in a sentence as context, and the same can be done for training embeddings of user actions, e.g., items that were clicked or purchased [5] and searches and rents that were clicked [15], by treating sequence of user actions as context. Ever since, embeddings have been applied to various types of recommendations on the Web, including music recommendations, house search, movie recommendations, etc. However, there are few research studies on news recommendation in social network.

2.2. Word Embedding Model

The embedded model is a concept of the natural language processing (NLP). In the traditional NLP, the classic bag-of-words model is the earliest embedded model. However, it only considers the word frequency in the article and ignores the word order information of the sentence. When the corpus is huge, the generated word vector is very sparse, which will affect the accuracy of the semantic representation. Hence, the neural network language model gradually replaces it [16]. But this model can only handle fixed-length sequences and the training speed will slow down when the vocabulary in the corpus becomes too large. In order to solve these problems, Mikolov et al. [17] proposed the Word2Vec model in 2013. The Word2Vec model first converts the words in the vocabulary to one-hot codes. Then, the codes will be mapped to low-latitude dense word vectors by using a classic three-layer neural network. Word2Vec is very suitable for solving sequence problems because there is a very strong correlation between the adjacent words and it can learn the hidden features in the entire sequence. A typical one is the text sequence. The Word2Vec model mainly includes two models: Skip-gram and continuous bag of words (CBOW).

Skip-gram model uses the central sequence to predict its context sequence, as shown in Figure 1. is defined as the central word, which can be represented by a -dimensional vector and is the length of the vocabulary. and are defined as the relevant forward looking and backward looking context (neighborhood) for the central word. For example, for a sentence “The quick brown fox jumps over lazy dog,” if we select “fox” as the central word to predict its two words nearby, the task of the model is to calculate the occurrence probability of “quick,” “brown,” “jumps,” and “over.”

The CBOW model shown in Figure 2 predicts the central sequence through the context sequence, and represent the context sequence as the input vector, and represents the center sequence as the output vector, where the input vector is -dimensional, and is also the length of the vocabulary.

As shown in Figure 2, after the model is trained, each word will be used as the center word. So, the number of its prediction is equal to the size of the entire vocabulary, and the time complexity is . However, the number of predictions using the Skip-gram model is more than CBOW. This is because when a word is used as the central word, the model must predict the context once. Therefore, the number of every prediction is more than K times. K is the size of the context window, so the Skip-gram model’s time complexity is . Although the time complexity of Skip-gram is higher than that of CBOW, the result trained by the Skip-gram model will be more accurate. So, when the data are small or sparse, the Skip-gram model can learn more information and obtain more precise word vectors. This is the main reason of selecting the Skip-gram model in this paper.

3. The Proposed Approach

The flow of the proposed algorithm is shown in Figure 3. The main work is to improve the word embedding model by adding clicking list as global context and use the improved model as the feature extractor of the content-based recommendation algorithm. The details are introduced as follows.

3.1. The Optimization of Word Embedding Model

Suppose that the dataset S is composed of news browsing sets collected from N users and each is defined as a continuous sequence of M news browsed by the user. As long as the time interval between two consecutive user browses is more than t days, a new browsing sequence is generated. The goal of constructing this dataset is to learn a d-dimensional representation for each browsing list by using the word embedding model, that is, to learn the representations of browsing data using the Skip-gram model by maximizing the objective function on the entire dataset S. The objective function is defined as follows:

It needs to evaluate the probability of a browsing list which is represented as the contextual neighborhood of browsing list . is defined using the soft-max as follows:where and are input vector and output vector of browsing list l, the parameter m is defined as the length of the sliding window in the browsing list, and is the ID set of all news. As we can see from (1) and (2), the context of user browsing sequences is modeled. The result is that browsing lists with similar contexts will have similar embedded representations.

The core principle of the objective function is to evaluate the probability of the contextual neighborhood appearance when the central sequence has existed, that is, to find the value of and . It also follows the definition of the joint probability model. For example, a sentence “The |weather| is |nice| today” is given. If we select the target word as “weather,” the sliding results are “the |weather,” “is |weather,” “nice |weather,” and “today |weather.” Assume the contextual sliding window is one word, and then only “the |weather and is |weather” are the correct samples. For the original Skip-gram model, this is a 4-class classification problem. When we input “the |weather,” the probability of the four situations is (the |weather), (is |weather), (nice |weather), and (today |weather). The aim is to maximize the probability . It can be seen that the probabilities of all the situations in the vocabulary need to be calculated at the same time. Furthermore, when the backpropagation optimization is conducted, all word vectors need to be updated. If the vocabulary is too large, the amount of calculation is very large. Assuming that the word vector is 100-dimensional, here it needs to update 500 parameters. Thus, a negative sampling method is proposed to optimize the calculation method [16]. Firstly, will be replaced by using the sigmoid function:

Secondly, the browsing sequence of is generated in the original dataset, which is the news data that the user browsed. And the browsing sequence of negative pair is generated, which is the data randomly selected from the unbrowsed data of the user. In the above example, the word “weather” is entered, the probability of P (today |weather), P (very |weather), and P (nice |weather) is output, and the 400 parameters are updated and the computation is reduced. The objective function is changed into the following form:where parameters to be learned are and , l, . The optimization is done via stochastic gradient ascent.

3.1.1. Views Are Used as the Global Context

Multiple words are formed into sentence sequences in the NLP domain. However, the browsing sequence data in social networks are more complex than sentence sequences in NLP. Because first the user’s browsing behavior is generated, and then the news view data are also generated. Users’ overall behavior preferences are reflected not only by the number of news views but also by their browsing behavior. So, news views are used as an additional condition to influence the training process and to improve the model. In other words, when users’ browsing habits are trained by the Skip-gram model, the results are affected by the view data. For example, “news 1, news 2, news 3, news 4, news 5” is used as a user browsing sequence, and “view 1, view 2, view 3, view 4, view 5” is used as a view sequence. Firstly, “news 4” is taken as the input item, secondly, the context sliding window is 1, and the training process is optimized by negative sampling. Finally, the probability of is calculated. The objective function is changed into the following form:where is the embedding of the page views lb.

The improved Skip-gram model is shown in Figure 4. The model has a window of size 2n + 1, and when the window is slid from left to right, the context sequence and views are predicted by the central sequence.

3.2. The Result of Item Similarity and List of Recommendations
3.2.1. The Similarity of the Target Items Is Calculated by the Model

First, the improved word embedding model is used as a feature extractor for content-based recommendation algorithms. Secondly, the user’s browsing behavior data are trained by the model, and the user’s implicit features are extracted. Assuming that there are m users and n texts, the user-browse feature matrix is constructed as follows: user u’s behavior of browsing text i is represented by the characteristic result .

Secondly, the cosine similarity is calculated according to the user’s current browsing behavior and the candidate vector in S, and the calculation formula is as follows:

Among them, the data of users u and are represented by vectors and . It can be seen from that the smaller the angle between the vectors, the higher the similarity.

3.2.2. A List of Recommendations Is Generated

Finally, the recommended K projects are calculated by the following equation (the maximum number of recommended projects is represented by K):

4. Experimental Results and Evaluation

4.1. Dataset

The data in this article were collected from Caixin.com; the data are a public dataset for users to browse news data. The data include 116,225 interaction data from 10,000 users during March 2014. Each record includes user ID, news ID, browsing time (counted in the form of time stamp), news content, and news release time.

This study made a preliminary analysis of user-news data and learned that the basic distribution of the dataset as shown in Table 1:

Table 1 shows that this dataset has a total of 10,000 user interaction data, and each user has viewed about 12 news articles on average. Among all users, the least read users only browsed 5 news articles, 50% of users read more than 7 news articles, and nearly 74% of users read less than 10 news articles. The users who read the most news articles read a total of 5354 news articles.

According to the reading statistics of the news reading distribution in Table 2, there are 6183 news articles in this dataset, and each news article has been viewed by an average of 19 users. The minimum number of news articles browsed is 1, 50% of the news articles is only browsed by one or two people, 75% of the news articles is browsed by less than 8 people, and the most read news articles have 2000 records.

From the distribution statistics of the above browse data, we can see that the browse data are sparse. The sparsity is a mathematical index to calculate the sparsity of data, which can directly calculate the sparsity of a data. (9) is used to calculate the sparsity of user behavior data:

The result is obtained according to the sparsity calculation formula (see Table 3).

According to the above analysis, it is found that news reading data are highly sparse, and users only browse few news articles, and a large amount of news articles is not used. For users who do not browse news, it does not mean that users are not interested in the news. If you use a recommendation system, it can allow users to find more useful information more effectively and can help users enhance their ability to perceive information. Even more useful unpopular information is found, which proves the importance of the recommendation system again from the data side. It can also be found from the sparseness data because the sparseness of the dataset is very high, which leads to the limitations of the traditional interest model.

4.2. Data Screening

The dataset in this paper does not directly represent users’ preference for news. For this problem, this paper refers to the negative sampling technique in reference [15] to overcome this problem. The following rules are followed when processing data:

(1) For each user, the data in the negative sample are news data that the user has not read; (2) for each user, news with more than 20 views in the dataset shall be taken as negative sampling data, which shall account for 25% in the dataset; (3) for each user, news with 2 to 20 views in the dataset shall be taken as negative sample data, which shall account for 75% in the negative sample dataset; (4) for each user, each user contains the same amount of positive sample data and negative sample data.

In this paper, the experimental dataset is divided into training set and test set. First, the user data with 0 to 2 views in the dataset are eliminated. Second, each user’s last browsing history is used as a test set. Finally, the dataset contains a total of 9,543 users and 5,768 news articles, and the data sparsity is 99.81%.

4.3. Experimental Evaluation Criteria

In this paper, F1-Score and Map are selected as the indexes for experimental evaluation [18]. Among them, F1-Score is the weighted average of recall rate and accuracy, which is mainly used as the evaluation standard for the accuracy of the recommendation system. F1_N represents the result of F1-Score when N news articles are recommended. The calculation method of F1-Score is as follows:where recall represents the proportion of the positive samples predicted by the algorithm in the original positive samples and precision represents the ratio of the number of correct recommendations to the total number of recommendations.

Map is the average of calculating the average accuracy. The high value of Map indicates that the recommendation algorithm has a better recommendation effect. Map_N represents the result of Map when the algorithm recommends N news articles. The calculation equation is as follows:where AP represents the average of accuracy.

4.4. Experimental Results and Analysis

We compare our proposed DLRASN with the following baselines:(i)Content-based recommendation (CR) has a good effect in recommending similar projects and is the most basic and classic recommendation algorithm.(ii)WMD [19] combines the advantages of singular value decomposition model and Word2Vec model. It has good abstraction ability to extract user characteristics.(iii)RPE [15] uses the word embedding model to directly analyze the interactive data of users to complete the recommendation work.(iv)RPEW uses Word2Vec to process text features based on RPE. The model combines the features of news text information to train the recommendation results.

The experimental results of F1-Score are shown in Table 4 and Figure 5. The experimental results of Map are shown in Table 5 and Figure 6.

Experimental results show the following. (1) According to the experiments of CR, RPE, and DLRASN, we can see that Skip-gram is superior to traditional machine learning methods. (2) According to the experiments of CR, WMD, RPEW, and DLRASN, we can see that additional project features or user features can affect the recommendation accuracy of the recommendation model. (3) According to the experiments of RPE, RPEW and DLRASN, we can see that the recommendation algorithm containing click characteristics is superior to the recommendation algorithm containing text features.

5. Summary

This paper first introduces theories about Skip-gram and then proposes DLRASN. This model adopts improved Skip-gram to learn users’ behavioral habits and holistically builds the recommendation model combined with ideas of the content-based recommendation algorithm. On the one hand, this algorithm copes with data sparsity faced with recommendation algorithms. On the other hand, the recommendation of users’ information in social networks can be improved by learning features of users’ preferences. Simulation experiments show that the proposed DLRASN is of higher accuracy and proved to be effective.

Despite the positive results that this model has achieved in information recommendation, the positive and negative information is not distinguished from each other. Hence, how to enhance the recommendation of positive information and reduce that of negative information will be our future work.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.