Abstract

In recent years, because of the popularity of the internet and mobile devices, the dissemination of new media in social networks has attracted extensive attention from scholars and the industry. Scale prediction or propagation speed prediction is to use the initial data to predict the propagation scale of the network. In the complex and changeable social network, how to accurately predict the cascading scale of new media information is the biggest problem at present. In the process of new media information transmission, because of the role of new media information transmission in guiding public opinion, the current hierarchical model of new media information transmission lacks the overall and local models. To solve this problem, a global structure modeling method is proposed. In addition, because of the uncertainty of new media information dissemination, a method of bidirectional recurrent neural network prediction and algorithm complexity is used, and a new method based on large-scale graph neural network is constructed. A prediction method of new media information dissemination speed and scale based on large-scale graph neural network. Through comparative experiments with previous research models, it is found that the NWIDF model constructed in this paper has a good prediction effect.

1. Introduction

In recent years, the modeling and prediction of new media information cascade has attracted extensive attention in the academic field and industry [1]. In recent years, with the improvement of computing power, prediction models based on deep learning have been successful in many tasks.

Existing models based on deep learning can be roughly divided into three categories: (1) models based on information content, such as text, image, video, and other multimedia content, which usually use technology from the field of computer vision and natural language processing to learn the effective representation of information content. (2) Based on time series model, it relies on recurrent neural network, pooling mechanism, and attenuation mechanism to linearly model the information cascade in social networks and [2] (3) model based on the graph structure, such as information cascade graph or global graph. As per reference [3], these models typically use graph neural networks and graph representation learning techniques to learn efficient structures of nodes, edges, and graphs to represent information. Other deep learning technologies, such as variational reasoning and reinforcement learning, are also used for information cascade scale prediction. In many cases, multimode, multiscale, and multitask learning techniques can be used to improve prediction performance. Deepcas [4] is the first model to model and predict the scale of information cascade using graph representation learning technology. It borrows the idea of deepwalk model [5] and uses the random walk method to sample the information cascade graph. The sampled node sequence is input into the bidirectional gated loop unit [6], and then the node embedding is obtained in cooperation with the attention mechanism [7]. The prediction of the deepcas model is end-to-end, and therefore, it does not depend on the manual functional design. Subsequently, in document [8], the author proposed the dcgt model, which adds the modeling of node content to deepcas. The purpose of the deephawkes model [9] is to combine the advantages of the generated model with the advantages of deep learning technology, so as to simultaneously consider the predictability and good prediction performance. The ANPP model [10] uses g1ove [11] for the text embedding of information content and node2vec [12] for user graph embedding. ANPP uses the attention mechanism to aggregate the obtained representations and time-series feature vectors. The Dtcn model [13] predicts the popularity of Flickr images by learning user and image embedding, pore context of shared sequences, and multistep temporal attention mechanism. The dtcn model uses RESNET [14] and long-term and short-term memory artificial neural networks [15] to simulate the visual and temporal dependence of pictures, respectively. The recursive cascade convolution network [16] regards the information cascade graph as a series of subinformation cascade graphs, and then, it uses the dynamic multidirectional graph convolution network to learn the structural information of the information cascade.

Although the model based on deep learning has achieved good results in the information cascade prediction task, it also faces many limitations and challenges. The computational consumption of the deep learning model is generally greater than that of the other two types of models. To obtain satisfactory prediction results, engineers usually need to perform complex parameter optimization and model training and face the risk of data overfitting. At the same time, in the prediction of the cascade scale of new media information, there is a lack of modeling of global and local communication structures, ignoring hierarchical modeling, and it is unable to cope with changes and uncertainties in the process of information dissemination. Therefore, the article starts from this angle, and relevant research is carried out.

The modeling and prediction of new media information cascades have attracted extensive attention in academia and industry in recent years [1]. In recent years, with the improvement of computing power, prediction models based on deep learning have been successful in many tasks.

Existing deep learning-based models can be roughly divided into three categories: (1) models based on information content, such as text, images, videos, and other multimedia content, these models usually use technologies from the fields of computer vision and natural language processing to effectively represent the content learning of information; (2) time-based models of sequences, which linearly model information cascades in social networks and rely on techniques such as recurrent neural networks, pooling mechanisms, and attention mechanisms [2]. (3) Models based on graph structures, such as information cascade graphs or global graphs, etc. [3], usually use graph neural networks and graph representation learning techniques to learn effective structural representations of nodes, edges, and graphs. Other deep learning techniques, such as variational inference, reinforcement learning, etc., are also used in information cascade scale prediction. In many cases, techniques, such as multimodal, multiscale, and multitask learning, are used to improve the prediction performance. DeepCas [4] is the first model to use graph representation learning techniques to model and predict the scale of information cascades. It borrows the idea of the DeepWalk model [5] and uses a random walk method to sample the information cascade graph. The sampled node sequence is input into the bidirectional gated recurrent unit [6], and it cooperates with the attention mechanism [7] to obtain the node embedding. The predictions of the DeepCas model are end-to-end and thus do not rely on manual feature design. Subsequently, in literature [8], the authors propose the DCGT model, which adds the modeling of node content to DeepCas. The purpose of the DeepHawkes model [9] is to combine the advantages of generative models with the advantages of deep learning techniques, thereby taking into account both predictive interpretability and good predictive performance. The ANPP model [10] uses G1oVe [11] for the textual embedding of information content and node2vec [12] for user graph embedding. ANPP uses an attention mechanism to aggregate the acquired representations and time series feature vectors. The DTCN model [13] predicts the popularity of Flickr images by learning user and image embeddings, sharing temporal context of sequences, and a multistep temporal attention mechanism. The DTCN model uses ResNet [14] and Long Short-Term Memory Artificial Neural Network [15] to model the visual and temporal dependencies of pictures, respectively. Recurrent Cascade Convolutional Networks [16] treat the information cascade graph as a series of subinformation cascade graphs, and then, it use a dynamic multidirectional graph convolutional network to learn the structural information of the information cascade.

Although deep learning-based models have achieved good results on information cascade prediction tasks, they also face many limitations and challenges. The computational consumption of deep learning models is generally larger than that of the other two types of models. The main reason is that deep learning learns the deep nonlinear network structure, and its essence is to approximate complex functions and represent the distributed representation of the input data. Deep learning can learn the essential characteristics of the dataset. But the problem may often involve causal reasoning, logical reasoning, and dealing with uncertainty, which is obviously beyond the ability of traditional deep learning methods. Hence, the predictions of deep learning models lack interpretability, because neural networks are essentially a “black box model.” Secondly, the computational consumption of deep learning models is generally higher than that of feature engineering-based prediction models and probability-based generation. The model should be bigger. To achieve satisfactory prediction results, engineers often need to perform complex parameter tuning, model training, and face the risk of overfitting the data. At the same time, in the prediction of the cascade scale of new media information dissemination, there is a lack of modeling of the global and local dissemination structure, ignoring hierarchical modeling and inability to deal with changes and uncertainties in the process of information dissemination. Therefore, the paper starts from this perspective, and related research is conducted.

3.1. Bayesian Graph Neural Network

A Bayesian network is a probabilistic graphical model. By adjusting the preset parameters or the prior knowledge of the model through sample data, the parameters of the Bayesian network or the posterior probability of the model are inferred to express uncertainty. The uncertainty of the node characteristics of new media information dissemination is mainly manifested in the uncertainty caused by noise data, missing or repeated data, etc., in the process of feature extraction and the uncertainty of the relationship between different characteristics of nodes and node labels. A probabilistic graph model that can solve uncertainty through the Bayesian graph neural network identifies hot topics in new media information, compares the prediction of node labels under different features, integrates the prediction of node labels by all features, and then judges the uncertainty of node features.

In the dissemination of new media information, as the network structure is not fully known and constructed by domain experts, it usually leads to missing important edges, adding false edges and other problems, resulting in poor model prediction effect and poor robustness.

This paper needs to propose a way to add missing important edges and prune irrelevant and spurious edges. In other words, the network structure needs to be reconstructed. The Bayesian graph neural network is used to solve the uncertainty problem of the node relationship in the reconstructed network.

Generally speaking, a neural network can be regarded as a conditional distribution model , i.e., the distribution of labels Y under the condition of input feature X and neural network weight W. Then, the learning process of the neural network can be regarded as maximum likelihood estimation. Based on this, the researchers proposed a Bayesian neural network [17], which, firstly, obtained the weight probability of the neural network based on the dataset not only to find its maximum posterior value but also to be used for the neural network. Networks introduce uncertainty. The prediction Y for a new input x can be obtained by integrating the posterior distribution of W, and the process can be expressed as follows:

However, since the posterior distribution (formula (1)) of the Bayesian neural network is often difficult to calculate directly, researchers have adopted different methods to approximate it [1821].

This paper considers reconstructing the network structure with a random graph generation model to solve the uncertainty of the network structure.

3.2. Random Block Model

The random block model is a generative model for random graphs. The model tends to generate graphs that contain populations, i.e., subsets characterized by a certain edge density interconnected. For example, edges may be more common within a community than between communities. The stochastic block model is important in statistics, machine learning, and network science, and in graph data, it serves as a useful benchmark for the task of recovering community structure. Reconstructing the network structure in this way can aggregate nodes with strong correlations in the network, while nodes with weak correlations will have certain edges if they are directly pressed.

The random block model has the following parameters:(1)The number of vertices n(2)Divide the vertex set into disjoint subsets called groups(3)A symmetric matrix of edge probabilities

Randomly sample the edge set: any two nodes and are connected by an edge with probability .

Its generation process is shown in Algorithm 1.

Step 1: for each nodedo
Step 2: constructing K-dimensional mixed membership vectors
Step 3: for each node pairdo
Step 4: constructor class initialization indicator variable
Step 5: construct category indicator variable receiver
Step 6: sample their interaction values

The group membership of each node depends on the context, i.e., each node may have different memberships when interacting with or being interacted with by different nodes. Statistically, each node is a mixture of group-specific interactions. After the random block model is represented by a generative graph, the network can be reconstructed and applied to the graph neural network to solve the uncertainty of the network structure.

3.3. New Media Information Dissemination Mechanism

In this paper, we mainly explain the mechanism of new media information dissemination from two perspectives, namely information cascade graph and user social network (global graph).

Cascade graph: given the new media information microblog I and its corresponding forwarding information cascade C, the information cascade graph can be defined as , where is a part of the user nodes participating in the information cascade, is a set of edges with a number of , representing all user interactions in the information cascade graph. A schematic diagram of a cascade graph growing over time is shown in Figure 1.

Global Graph: the global graph contains all the nodes and edges in the social network, which can be defined as . The edges represent different node relationships in the information cascade. An example of a typical global graph is the user’s follow du and followed network in TikTok.

In this paper, the information cascade graph represents the local propagation characteristics of information in the network, while the global graph represents the associations between nodes in the whole network. Taking TikTok as an example, the following relationships, forwarding relationships, and historical behaviors among users can all be reflected in the structure of the global graph. The previous work [810] simply used features, such as the number of followers of the user (which can be regarded as the degree of nodes) as the structural features of the user, which cannot fully capture the user’s influence, preference, and other attributes. There are a few other works [14, 17] that use other types of structural features. But, they also all make strong assumptions about the intrinsic mechanism of information dissemination, or face the risk of overfitting on specific data, resulting in their poor generalization performance, and when migrating to other applications or data platforms (with different propagation mechanism or propagation mechanism unknown) is less effective.

4. New Media Information Dissemination and Scale Prediction Path Based on Large-Scale Graph Neural Network

4.1. The Overall Architecture of the Prediction Model

This section builds the general framework of the NWIDF prediction model. It consists of four parts: structure learning, time series propagation, new media information uncertainty propagation, and predictor. Structural learning mainly captures and models the contextualized structural patterns in information cascade graphs and the implicit relationships of users in social networks. It leverages techniques from graph signal processing to learn structural representations of information cascades: local structure modeling based on wavelet maps and user global structure modeling based on sparse matrix factorization. Temporal propagation uses a bidirectional recurrent neural network to model temporal dependencies in information propagation. The uncertain propagation of information in new media uses a variational autoencoder to model changes and uncertainties in information propagation and information growth, and it uses a regularized flow to estimate the posterior distribution of hidden variables for a series of complex and flexible transformations. The predictor combines recurrent neural networks and variational inference to learn high-order representations of the information cascade, and finally, it uses a multilayer perceptron to make predictions about the final size of the information cascade, as shown in Figure 2.

As the core of the NWIDF model system, the new media information person is the main body of information production, transmission, processing, and management. Information people usually include are users and platforms. Users can enhance the quality impact of new media platforms by accepting feedback and continuously optimizing. New media information technology is the support of information activities. Through the collection, processing, dissemination, and feedback of information, the continuous operation of the NWIDF model system is realized.

4.2. Modeling of New Media Information Cascade Structure under Large-Scale Graph Neural Network
4.2.1. New Media Information Cascade Learning Structure

In the new media information dissemination mode, the cascade graph is introduced , which is represented as an adjacency matrix, and a self-loop is added to each node, as shown in Figure 3. Then, according to the arrival time of each node in the cascade graph , one-hot encoding (One-Hot Code) is performed to represent the node characteristics. Divide the observation window into disjoint fine -grained time intervals, then encode each time interval.

At the moment, the node forwards to the node. Then, the adjacency matrix of the cascaded graph at this moment is 1, and the rest are 0. The adjacency matrix embedding for the cascaded graph is encoded as follows:

To capture the global graph structure information in the process of cascading information diffusion in new media, we use a graph convolutional network to learn the Markov process embedded in [22] information diffusion, i.e., it will converge to a stable value after a period of diffusion. Normal distribution will converge to a stable distribution after a period of time, similar to the normal distribution [23]. Therefore, let the cascaded Laplacian matrix conform to the random walk characteristics of the cascaded graph, and let the Markov state transition probability matrix . According to the graph convolution network formula, the Lapuas matrix can be obtained as follows:

D represents the degree matrix of the cascade graph, and K represents the number of captured neighborhood layers.

is the eigenvalue of the adjacency matrix satisfying the condition of , where U is the eigenvalue decomposition. We can then compute the graph wavelet for each node as follows:where is the one-hot encoded vector of node u, and the filter kernel is a continuous function defined on . Here, we use the Heat kernel function , where s is a scale parameter defined on the spectrum .

In particular, for a given node and a scale parameter s, the empirical feature function is formally defined by the following formula:where is the mth wavelet coefficient of . Then, the embedding of node in the information cascade graph can be obtained by concatenating the real and imaginary parts.

The dimension of the node embedding is , and the first element of the embedding is set as the weight of the node edge, which is defined and regularized by the following formula:

From the perspective of sentences, the emotion of a long sentence is mainly determined by several keywords connected to the root node, and the function of other words is ignored, resulting in the lack of key information. In recent years, the rapid development of Internet+ has made the sentiment analysis of comment texts occupy a certain proportion in user-based big data analysis. Compared with the inflexibility of traditional machine learning methods, deep learning methods can be more efficient and accurate. The emotional information is contained in the text, so obtaining text emotional information through deep learning is a relatively popular research field at present, and it has achieved good research success. Since the machine cannot directly recognize the plain text input, it is necessary to vectorize the text to convert the text into a numerical form that the machine can recognize.

To solve the problem of the adjacency matrix generated by the dependency tree containing a large number of zero elements, there may be information loss and data sparseness. In this paper, a global graph matrix is constructed, and a layer of identity matrix is added to the original adjacency matrix, which is the global graph matrix. The construction of the global matrix in this paper is shown in equations –(11).

means that all elements on the diagonal of the adjacency matrix are set to 1, which means that each node in the graph performs a self-loop operation. The formula indicates that the ith node has a directed connection to the jth node. indicates that there is no connection between the ith node and the jth node, and when generating the adjacency matrix, a unit matrix is added, which means that an edge is added between each node in the graph structure to connect. This allows the graph structure to contain global dependency information. represents the connection of all nodes in the graph, which is the global graph matrix. This operation allows each word to play a corresponding role, avoiding data sparse and incomplete information.

Compared with the node embedding in the information cascade graph, the node embedding in the global graph expresses a very different concept of information propagation in the global graph. For the information cascade graph, whether for those influential nodes, hub nodes connecting different communities, or inconspicuous leaf nodes, nodes with similar structural positions will have similar node embeddings even if they are very far apart in the graph. This positional property is captured by the propagation mode of the graph wavelet. For the global graph, the low-dimensional continuous embeddings learned by the model preserve the neighbors of nodes in the global graph. Hence, nodes with similar preferences and behaviors will have similar spatial embeddings.

Unlike information cascade graphs, global graphs often contain up to millions of nodes and edges, making representation learning on them very difficult. Existing graph learning models [15, 18] are difficult to directly apply to practical information cascade prediction problems. The text uses sparse matrix factorization to process and model large-scale global graphs efficiently and in a scalable manner ζ_c ζ_g.

4.2.2. Temporal Propagation Build

In the above, we used graph wavelets and sparse matrix factorization to generate embeddings that encode the structural information of users in information cascade and global graphs. In particular, they are characterized by the following: (1) structurally equivalent nodes in an information cascade graph will have similar embeddings (refer to [20]). For example, hub nodes have stronger propagation capabilities than leaf nodes. (2) Adjacent nodes in the global graph will have similar embeddings, i.e., adjacent nodes will have similar preferences for disseminating specific information.

In addition to the structural information contained in the information cascade, time series information is considered to be one of the most important features in the scale prediction problem of the information cascade, and it has a key impact on the final scale of the information cascade. To capture the temporal nature of information cascades, we use bidirectional gated recurrent units (BiGRUs) to model the cascade effects in information. Recurrent neural networks are widely used in the modeling of time series data, and they are used to model time series features in information dissemination. The calculation formula of BiGRU is shown in formulas (12)–(14), which can be expressed in the formula. The hidden state of the forward output at the moment is expressed in watts. The hidden state of the reverse output at time, , represents the hidden state of the output, represents the input, and and represent the weight matrix, where is the bias vector.

BiGRU includes the process of forward GRU and reverse GRU transfer. Bidirectional GRU can enrich the representation of contextual information based on aspect words and enhance the interaction of information in a complementary form, so that more useful information can be captured compared to unidirectional GRU. Usually, two-way GRU will also perform better than one-way GRU. Deep BiGRU is to continuously expand the depth of the neural network. On the basis of one layer of BiGRU, the method of superimposing multiple layers of BiGRU is to use the output of each BiGRU layer as the input of the corresponding node of the next layer of BiGRU.

However, only using the hidden state of the last layer of the RNN has certain drawbacks for information cascade prediction. It is because of the flat sequence generation process in recurrent neural networks, where the embedding of each node is dependent on the node embedding at the previous time. The problem is that the model is forced to generate all higher-order information in a deterministic and step-by-step manner. This setting has significant limitations for exploring uncertain dependencies in information cascades. In addition, because of the limitations of RNNs themselves, these models cannot handle long-term dependencies, and their predictive performance may drop significantly when the length of the information cascade is very long.

4.2.3. Uncertainty Modeling of New Media Information

An information cascade C consists of a growing sequence of participants, each of which is associated with a learned representation that represents a specific stage of information dissemination. In the above, for each node in the information cascade graph and global graph, we use graph wavelet and sparse matrix factorization to learn its embedding representation and for the node, respectively. In a more general sense, any other type of graph representation learning method can be used to enhance the learning ability of the model, for example, text and image embeddings. Without causing ambiguity, we use to represent each participant in the information cascade C, i.e., .

Let Enc(∙) be the input encoder and Dec(∙) be the reconstructed input decoder. The deep variational autoencoder based on neural network can be defined as follows:where is the reconstructed input and is the hidden vector. The variational autoencoder accepts high-dimensional data as input and generates a compressed hidden representation that is sampled from a conditional prior distribution with standard deviation and variance . The original input is then reconstructed from this hidden representation.

In order to learn an efficient probability-based representation from the information cascade data, which captures the variation and uncertainty of the information cascade propagating in the network, the variational autoencoder samples and andfrom the output vector of the encoder. Then, use the reparameterization trick to sample the hidden vectors from the Gaussian distribution [22].

Given a hidden random variable (in this paper, it is referred to learned in higher-order variational autoencoders, the regularization flow is a class of generative models that transforms the observed vector Z into the required target hidden vector . The transformation consists of a series of K invertible mappings (invertible mappings). The Jacobian matrix of the transformation is computable, and the function is differentiable. In more detail, the regularization flow uses the mapping function , which is defined as follows:where is the distribution of the random vector Z and the transfer function is invertible. To obtain an effective probability density from the initial density , a series of hierarchical transformations of K regularization flows successively use equation (15) to calculate the target density.

If the mapping function is appropriate, then the learned mixture distribution of hidden random vectors more closely matches the distribution of the real data than the simple independent Gaussian distribution.

4.2.4. New Media Information Dissemination Speed and Scale Effect Predictor

Previous studies have found that information cascades have a time decay effect, i.e., the influence of one node on other nodes decreases over time [57]. In this paper, a nonparametric time decay function is used , and according to the literature [25], we get the following:

Among them, , and indicates the forwarding amount of new media information, indicating that the time decay function is considered.

The hidden state of the number.

The last part of the NWIDF model is composed of fully connected layers (MLP). According to the previous calculation , it can be calculated as follows:

The final task is to predict the increment of information dissemination within the specified time interval, introducing MeanSquare Log-Transformed Error (MSLE), namely, the following:

As the loss function loss, use the Adam optimizer to optimize the loss value to make it optimal (minimum).

Through the above synthesis, the following training process can be performed as shown in Algorithm 2.

input: cascade graph C, sequence of cascade graph adjacency matrices , time window of observation
output: predicted information cascade incremental scale
(1)the Laplacian matrix of the concatenated graph C;
(2)the graph wavelet for each node ;
(3)compute the node embeddings of the information cascade graph ;
(4)Calculate the node embedding of the global graph ;
Calculate the global matrix
(5)while not converge do
(6)Train a bidirectional gated recurrent unit to acquire ;
(7)for each user in the pair i do
(8)calculate
(9)end for
(10)get ;
(11)Train a cascaded variational autoencoder to obtain ;
(12)Obtained by K transformations ;
(13)Combining sums and sums to make final scale incremental forecasts;
(14)end while

5. Experiment Setup and Results Analysis

5.1. Test Setup
5.1.1. Dataset

To evaluate the effectiveness and scalability of NWIDF models in information cascade prediction, experiments are conducted using publicly available datasets and compared with previous studies. The data information statistics of the dataset are shown in Table 1.

Weibo [25]: this dataset selects all the original posts generated by Sina Weibo on June 1, 2016, and tracks all retweets of each post over the next 24 hours, including a total of 119,313 posts. Figure 4(a) shows the distribution of the cascade size; Figure 5(a) shows the prevalence of the cascade, showing that after 24 hours, the prevalence reaches saturation. This paper follows a similar setup to CasCN [26], i.e., observation time window T = 1, 2, 3 hours. Finally, the stacks are sorted according to the stacking time after preprocessing, and the top 70% of the stacks are selected as the training set for the stacks, and the rest are equally divided into the validation set and the test set.

HEP-PH [27]: the HEP-PH dataset (High Energy Physics Phenomenology Dataset) comes from the electronic version of the arXiv paper citation network. The data covers papers from January 1993 to April 2003 (124 months), in which there are citations for all 34,546 papers. If paper i cites paper J, the paper citation graph contains directed edges from i to j. If a paper cites or is cited by a paper outside the dataset, the graph will not contain information about this. Figure 4(b) shows the distribution of cascade sizes, and Figure 5(b) shows the prevalence of cascades. For the observation window, T = 3, 5, and 7 years were chosen, corresponding to the prevalence reaching 50%, 60%, and 70% of the final scale, respectively, as shown in Figure 5(b). Then, 70% of the cascades are collected for training, and the rest are split equally into validation and test sets.

5.1.2. Benchmark Model Selection

From traditional cause analysis methods, hydrological statistics methods, time series analysis methods, etc., to modern artificial neural networks, wavelet theory, gray system and turbidity theory, each method has its own advantages because of its different mechanisms and applicable environments. To verify the effectiveness of our proposed NWIDF model in predicting the scale of information cascades, we choose three basic models: a feature engineering-based model (Topo-LSTM model [29]), a statistical generative model-based DeepHawkes model [25], and a deep learning based model CasCN model [26]. The comparative model is the latest model with high reliability in the research field, which can supplement and improve the comparative analysis research.

5.1.3. Parameter Setting

All experiments in this paper are performed on Ubuntu 16 operating system, Intel Core i9-9980XE CPU, 1286 memory, and NVIDIA TiTan RTX (24G) graphics card.

For DeepCas [28], DeepHawkes [25], Topo-LSTM [29], and CasCN [26], refer to DeepCas to set the user’s embedding dimension to 50. The number of hidden units in the fully connected layer of the recurrent neural network is 32 and 16, respectively. The user learning rate , and the other learning rate is . The batch size of each iteration is 32, and when there are 50 consecutive iterations, the loss of the validation set does not drop, and the model training process will stop. The time interval for Weibo dataset was set to 10 minutes, and the time interval for HEP-PH was set to 2 months.

This paper uses Tensorflow to implement the NWIDF model and uses the Adam optimizer to optimize the parameters through gradient descent. Except that, the embedding neighborhood layer of graph representation learning adopts K = 2, and the rest of the model parameter settings are consistent with the above models.

5.1.4. Evaluation Indicators

According to the existing work, a standard evaluation metric, MSLE (see equation (22)), is selected in the experiment to evaluate the linking accuracy. Note that the smaller the MSLE, the better its prediction performance.

5.2. Result Analysis
5.2.1. Experimental Comparative Analysis

(1) Performance Comparison of the Benchmark Version. The benchmark version of the NWIDF model proposed in this paper will be experimentally compared with previous cascade prediction models on real datasets.

The DeepCas model [28] is the first deep learning architecture for information cascade prediction, which represents a cascade graph as a set of random walk paths, piped through a bidirectional GRU neural network with an attention mechanism to predict the size of the cascade. It mainly uses the information of structure and node identity for prediction.

The DeepHawkes model [25] integrates the predictive power of end-to-end deep learning into the interpretable factors of the Hawkes process for popularity prediction. The combination between deep learning methods and cascade dynamics modeling processes bridges the gap between the prediction and understanding of information cascades. This method belongs to both generative and deep learning-based methods.

The Topo-LSTM model [29] is a directed acyclic graph structure (DAG structure) RNN that takes a dynamic DAG as input and generates topology-aware embeddings as output for each node in the DAG, thereby predicting the next node.

Pak et al. proposed a particulate matter (PM) prediction model (CNN-LSTM) based on spatiotemporal convolutional network and long short-term memory network and applied it to the concentration prediction of PM2.5 in Beijing. Using mutual information to analyze the spatial-temporal correlation, considering the linear and nonlinear correlation between the target and the observed parameters and combining the historical air quality and meteorological data, the spatial-temporal eigenvectors reflecting the linear and nonlinear correlation between the parameters are constructed. (STFV), CNN-LSTM prediction model extracts the inherent relationship between PM2.5-related latency air quality and meteorological input data through CNN and reflects the long-term historical process of input time series data through LSTM, using 384 monitoring stations across the country for 3 years The validity of the model is verified by the air quality data and meteorological data. [26].

Singh et al. used deep learning for stock prediction and proposed a stock prediction model based on 2-dimensional principal component analysis (PCA) and deep neural network (DNN), which combined the closing price, highest price, and lowest price. 36 indicators, such as opening price, are used as the input of the stock prediction model, and the original data matrix is projected into the projection matrix by (2D)2PCA. The dimension of the input sample is reduced, and then the dimension-reduced data is used as the input of DNN in the prediction model. Finally, get the predicted closing price. Compared with the radial basis function neural network (RBFNN), in the stock forecast of Google in Nasdaq, the network (RBFNN) rate is improved by 4.8%, and the actual return (i.e., the correlation coefficient with the predicted return of information dissemination) is 17.1% higher than that of RBFNN [27].

The CasCN model [26] combines the deep learning framework of structure and time, uses graph convolutional network to capture network spatial structure information, and incorporates temporal decay function using a recurrent neural network to achieve the more efficient use of temporal information. This model is a deep learning method.

Table 2 summarizes the performance comparison between the NWIDF model and other model benchmarks on the Weibo and HEP-PH datasets. The comparison of the NWIDF model with the DeepCas model proves that it is not enough to simply embed nodes as a graph representation, and it cannot represent the graph as a set of random paths. Because DeepCas fails to consider timing information and topology of cascaded graphs, its performance is worse than other deep learning-based methods. Topo-LSTM also lacks the processing of timing information, resulting in its poor performance. Although the DeepHawkes models cascades in a generative manner, it does not perform optimally because of its weak ability to learn structural information. CasCN considers temporal information and spatial topology but ignores the fusion between the two features. Finally, the NWIDF model proposed in this paper performs information cascade prediction (tweet retweets and paper citations) on both datasets, which is significantly better than other models. For example, in the Weibo dataset for 1, 2, and 3 hours, the MSLE values were 2.123, 2.012, and 1.776, respectively. Observing in the HEP-PH dataset for 3, 5, and 7 years, the MSLE values are 0.939, 0.843, and 0.812, respectively. The data shows that a good prediction effect has been achieved. Compared with CasCN, the prediction errors of the NWIDF model proposed in this paper are reduced by 5.31%, 1.18%, 7.31%, and 6.47%, 8.07%, 8.46%, respectively, thus confirming the effectiveness of the model.

5.2.2. The Influence of Global Graph on Information Cascade

NWIDF-All: we removed the structure learning module in the NWIDF model. For NWIDF-All, all nonroot nodes in the information cascade graph are directly connected to the root node, and we do not use global graph information.

Firstly, the validity of the bidirectional recurrent neural network is verified, and experiments are designed. Construct a shortened version of NWIDF-GRU, namely, BiGRU in the benchmark version, and then compare it with CasCN, which is equivalent to NWIDF-GRU, adding attention mechanism on the basis of the CasCN model. During the experiment, the parameters of the two are the same, and the experimental results when the sampling neighborhood layers K = 1, 2 are shown in Table 3.

The performance comparison of the embedding layer K = 1, 2 and the CasCN model K = 2 is given in Table 3. According to Table 3, it can be seen that when K = 2, the NWIDF model proposed in this paper is better than the CasCN model because the node embedding of the global graph is considered, which can couple the timing information and the spatial structure information. When K = 1, after observing in Weibo for 2 hours, the MSLE of NWIDF-All is larger than that of CasCN. As K = 1, the spatial structure information taken is insufficient, resulting in slightly lower results.

Then, the effect of timing information on cascade prediction is verified. Analyze the variant NWIDF-BiGRU of NWIDF, i.e., remove the bidirectional recurrent neural network from the NWIDF model proposed in this paper, and use BiGRU to compare with CasCN, which is equivalent to NWIDF-BiGRU, which is generated by replacing LSTM in the CasCN model with the BiGRU model. From the data in Table 4, it can be seen that when K = 2, the MSLE = 1.783 observed by Weibo for 3 hours and the MSLE = 0.84 observed by HEP-PH for 7 years are better than the MSLE of CasCN. Thus, the importance of timing information in information cascade is confirmed.

Finally, to verify the impact of time series information and spatial information on cascade prediction, we use BiGRU on the basis of CasCN, add a structure learning mechanism, and then adjust the number of embedded neighborhood layers K to compare with CasCN, respectively. It can be seen from the data in Table 5 that when K = 1, 2 on Weibo and HEP-PH datasets, MSLE values are smaller than those of CasCN, indicating that the model proposed in this paper is better than CasCN in terms of time series information and spatial topology information. The capture is more comprehensive, which improves the efficiency of the model and reduces the loss rate.

The above experiments show that time series information and spatial structure information have an important impact on the information cascade prediction effect. The combination of the two can better ensure the accuracy of prediction. The capture of information is also more comprehensive, which also makes the model more generalizable.

6. Conclusion

Under the background of the wrong public opinion orientation caused by the rapid spread of new media information and its wide spread, it is of great practical significance to carry out the prediction of the spread of new media information and the scale effect. Based on the fact that the current hierarchical model of new media information dissemination lacks global and local models, this paper starts from the characteristics that new media information conforms to node dissemination and conducts prediction research on the speed and scale effect of new media information dissemination. The main research contents are as follows [24]:(1)The modeling method of local and global propagation of the characteristics of new media information propagation is proposed, and considering the uncertainty and scale effect of information propagation, a graph neural network-based NWIDF model is proposed, which starts from the information cascade structure. Graph the structure learning for information propagation based on locality and globality.(2)On the two large-scale information cascade datasets of Facebook and TikTok, the current mainstream and advanced prediction models are applied to carry out experimental research, and it is found that the NWIDF model has better prediction effect and performance. To a certain extent, it can predict the spread and scale of new media information and public opinion, control the rapid spread of wrong public opinion, and quickly cut off the communication channel to provide more advanced ideas.

In this paper, the NWIDF model is the hotspot of current network and graph neural network research. The main difference in speed is the large-scale graph and scale effect of new media information dissemination, and other types of features are caused by the complexity of the research content. In this paper, the NWIDF model is not extended to other types of features, such as learning various content features (number of user attention, h-index of authors, historically published articles, etc.). However, it provides a richer theoretical and practical basis for the future use of more powerful and complex graph neural networks, such as heterogeneous information networks with multiple node types and edge types. The model can be generalized to other types of graph-based business applications, such as virus information diffusion, interpretable information prediction, rumor detection, epidemic control, etc.

Data Availability

The dataset can be accessed upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.