Abstract

In recent years, deep neural networks have achieved great success in many fields, such as computer vision and natural language processing. Traditional image recommendation algorithms use text-based recommendation methods. The process of displaying images requires a lot of time and labor, and the time-consuming labor is inefficient. Therefore, this article mainly studies image recommendation algorithms based on deep neural networks in social networks. First, according to the time stamp information of the dataset, the interaction records of each user are sorted by the closest time. Then, some feature vectors are created via traditional feature algorithms like LBP, BGC3, RTU, or CNN extraction. For image recommendation, two LSTM neural networks are established, which accept these feature vectors as input, respectively. The compressed output of the two sub-ESTM neural networks is used as the input of another LSTM neural network. The multilayer regression algorithm is adopted to randomly sample some network nodes to obtain the cognitive information of the nodes sampled in the entire network, predict the relationship between all nodes in the network based on the cognitive information, and perform low sampling to achieve relationship prediction. The experiments show that proposed LSTM model together with CNN feature vectors can outperform other algorithms.

1. Introduction

Nowadays, images and videos have become an important way for people to obtain and transmit information in daily life. Deep learning (DL) is one of the most popular branches of machine learning. In the past few years, it has been heating up in academia and industry through computer vision, audio, and text processing. It has achieved a series of important achievements in such complex tasks. Social networks focus on the mathematical modeling of node relationships in the network. Researchers use mathematical models to abstract the relationships between people in daily life and use mathematical tools such as graphs and matrices. We use it to quantify this relationship and ultimately reveal the deeper rules hidden under this relationship. Social networks are an important platform for interacting with friends and obtaining information. Among them, efficient recommendation algorithms are important tools to help users filter and filter large amounts of information [15].

The ability to model features and complex mapping relationships using automatic learning deep neural networks can improve the accuracy of prediction. Deep learning is essentially an artificial neural network system supported by large amounts of data. Generally, deep learning models have a large number of parameters and need to be trained based on large amounts of data. Deep learning has strong computing functions, so deep neural networks are always trained and used in high-performance parallel machines. Social relationships are important information that reflects user interests and hobbies in social networks and are an important factor to be considered when designing effective algorithms. Therefore, it is necessary to conduct research on the recommendation of social relations in social networks to further explore the potential value of social relations and provide technical support for the development of social networks and recommendation systems [610].

2. Deep Neural Network and Image Recommendation Algorithm

2.1. Deep Neural Network

Deep neural network is also called deep education. The deep neural network has multiple nonlinear mapping function conversions, which can adapt to very complex functions. Existing investigations that use DN to achieve legacy can be roughly divided into two categories. (1) We use DNN to extract rich input data (visual features of photos, text descriptions, etc.), extract effective functional representations, and use them as additional aids. The information is integrated into the replication model. (2) Coordinated filtering is the core of the maintenance system. Its purpose is to adapt to the interaction between users and projects. DNN has powerful functions, suitable for any continuous complex function, and can be used for high-order nonlinear interaction relationships between users and items [11].

The characteristics of deep neural network are as follows. First, the general form of deep learning is multilayer artificial neural network. Deep learning mode often contains millions of numbers and tens of millions of parameters. As a support for the huge mode, it needs a lot of training samples, so as to improve the generalization ability. The scale of network and the amount of data together determine that deep learning is computationally intensive. Second are automatic feature extraction and classification. The main problem of deep neural network is to extract features from data automatically. The feature extraction is abstractly carried out through the combination of each layer, which has the characteristics of hierarchy. Local and low-level features are extracted from the lower layer (near the input end) of the network. These features are combined layer by layer and transformed into global and abstract features at the high level (far input). The significance of features is gradually strengthened from low to high [12].

Convolutional neural networks can learn more essential features from the original image by repeating multiple convolution and pooling operations and finally generally realize classification and recognition in the entire connection layer connecting one or more layers and extract the features Incorporate into other shallow machine learning models. Different from the usual BP neural network, the realization of the convolutional neural network mainly depends on three important concepts: local acceptance field, weight sharing, and pooling. The CNN structure includes two layers, a feature extraction layer and a feature mapping layer. Local perception and weight sharing are used in feature extraction. The advantage of this method is to greatly reduce the number of training parameters and promote training [13]. Since each input of the neuron is connected to a part of the upper layer (i.e., the local perception area), the feature can be extracted locally. The positional relationship between the features is locked after the feature extraction operation. Each mapping corresponds to a plane. At the same time, all neurons on the plane have the same value. The extraction of the feature layer occupies most of the network calculations in this layer. After the mapping, an activation function is used for processing to ensure the invariance of features such as displacement and rotation. The convolution operation usually uses the convolution kernel and the input image for multiplication calculation. Since the convolution kernel can learn its features, if you need to learn multiple features of the image, you need multiple convolution kernels. After convolution in order to obtain features, a band loop operation is usually performed. The purpose of this operation is mainly to prevent excessive try-on. Now, convolutional neural networks have tended to have smaller and smaller convolution cores (minimum 1x1 convolution core), and the number of convolutional layers has increased. The more layers, the more abstract the learned functions, and the fewer layers, the more learned functions. The more comprehensive. The learning passed each time is passed to the next layer, and each layer uses the data characteristics of the digital filter. In the case of CNN, local perception enables neuron processing units to access the most basic features [14].

CNN training is mainly divided into three stages. First, the input will enter the network, through the processing of various layers in the network, including looping, convolution calculation, and fully connected layer calculation. Then, different feature maps are obtained, and the next layer uses the feature maps obtained in the upper layer as input and continues processing until the last layer repeats the output this time. Then, in the postpropagation stage, the output obtained by the preprocessing is compared with the label, the residual set is calculated, and the BP algorithm is used to inversely calculate the residual of each layer. Finally, the residual values of each layer at the upper level update the network parameters according to a certain learning rate, the repeated training process of the folded neural network ends, and the next iteration is performed. The encoder is one of the building blocks of deep neural network and nonlinear neural network structure, which can reproduce the input signal as much as possible [15].

Assuming that the training group used for neural network training contains n training examples , then the objective function is defined as follows:

According to the gradient descent algorithm, the data formula for updating and is as follows:

Here, α is the learning speed of the model, while and , respectively, represent the partial differential of the objective function of the network with respect to the weight parameter W and the partial parameter b [16].

The energy function of the RBM network is defined as follows:

Using the Baynes formula to find the conditional probability of a neuron being 0 or 1, the formula is as follows:

The energy function can be used to solve the edge probability, conditional probability, and simultaneous distribution probability of each neuron state in the input layer and output layer of the RBM network probability distribution [17].

2.2. Image Recommendation Algorithm

Through analyzing and researching algorithms, it is possible to discover the preferences of users who add recommended historical behavior information, and in the user model, the user information and object information are input through the users and recommended objects of the model structure, and the recommendation algorithm is used to generate the recommendation results. According to the behavioral data corresponding to the result of the algorithm recommended by the user, these new data analyze the user's model of the calculated behavior through the user’s preference and affect the result of the following recommendation. The whole recommended framework is mainly divided into four steps.(1)First of all, we perform image preprocessing. The images in the dataset must be preprocessed, such as the extraction of basic visual features of the image, the extraction of salient area features, the correlation calculation of image tags, and the establishment of hash indexes to speed up the search.(2)We define the hypergraph model by defining images as vertices and k images with similar features as super edges. According to the order of the super graph, the order of the super edge, the diagonal matrix of the super edge, and the defined super graph, the correlation calculation of the super graph is removed according to the order matrix of the documents and so on. [18].(3)We derive based on the classification algorithm of multifunctional and different super charts. We define the cost function suitable for the construction of multifunctional heterogeneous hypergraphs, convert it into a problem of minimizing the value of the cost function, and finally calculate and derive the loss function. The program will be converted into an executable formula or algorithm step.(4)By combining the classification scores of all images in the dataset calculated by the multifunctional heterogeneous hypergraph classification algorithm, we find the k image closest to the input image of the final image set. The visual features and label features of photos uploaded by users are preprocessed in order to obtain feature vectors with less noise, and the similarity is initialized based on specific features.

Image recommendation algorithms can be divided into the following categories:(1)We first have coordinated filtering recommendation algorithm.The basic idea of the collaborative filtering recommendation algorithm is that when the system recommends objects of interest to the user, the recommendation system will find the user closest to the user's needs and preferences and determine the corresponding recommendation items according to different interests. The recommendation algorithm of coordinated filtering has two methods: memory-based and model-based [19].(2)Second, we have content-based recommendation algorithm.In the content-based recommendation method, users’ historical information can be analyzed and searched. It can display the documents that the user clicks on, the documents that are not clicked, the bookmarked documents, and the document themes required to establish the user’s setting information. According to the similarity calculation method selected by the model, the similarity between the recommended item and the user’s priority document is calculated, and the item of interest to the user is finally recommended according to the calculation result [20].(3)Third, we have recommendation algorithm based on chart structureBy establishing a model, the user item evaluation matrix of the coordinated filtering algorithm can be constructed into a two-part chart. The model consists of two parts of the graph nodes representing users and projects, showing the corresponding users’ evaluations of the corresponding projects. The recommendation algorithm based on the chart structure provides appropriate recommendation results by analyzing the two-part chart structure.(4)Fourth, we have hybrid recommendation algorithm.Hybrid recommendation is used to solve problems in coordinated filtering, content-based and graph-based recommendation algorithms to achieve complementary recommendation effects, and combine the results of various algorithms to provide users with the most satisfactory results. Hybrid recommendation uses coordinated filtering, content-based and graph-based recommendation algorithms to combine the recommendation results generated by two or more models, and recommend the combined results to users [21].

2.3. Social Network

Social relations in social networks are divided into directed and undirected, and the strength of their connections is also different. Complex social networks refer to social networks with complex network characteristics and are an important branch of complex networks and social networks. Complex social networks have not only the various characteristics of complex networks, but also the attributes and structural forms of social networks. In a complex social network, nodes also represent various members of the network. Each member has a certain degree of cognitive ability and rationality. This kind of value orientation and cognitive ability of individual members is an important function that distinguishes complex social networks, community networks, IoT networks, and language networks from other complex networks. On the whole, the nodes of a complex social network have the following characteristics. (1) The nodes themselves have specific prediction and decision-making capabilities, including the ability to act independently and the ability to predict the overall development of the network. (2) The close relationship between the node and other nodes, and its own preferences affect the decision-making ability of the node. (3) The node functions of other nodes and the entire network will also have a certain impact on the diffusion of information in the network [22, 23].

3. Image Recommendation Algorithm Experiment Based on Deep Neural Networks

3.1. Experimental Setup
3.1.1. Experimental Data Set

MovieLens is a dataset widely used to verify the performance of the CF algorithm. In the MovieLens dataset, each user has at least 20 interaction records. The Yelp dataset contains 5 data parts (business, business attributes, check-in settings, users, and reviews). In order to better evaluate the modified NCF framework, Yelp is first filtered so that each user in the dataset includes at least three interaction records. The specific numerical indicators are shown in Table 1.

3.1.2. Parameter Setting

In order to avoid excessive conformity of the model, we adjust the value of the regularization term coefficient of each method. In the dimension of the potential factor vector of the user (item), 30 are selected in the experimental test. That is, embedding size = 30. In terms of learning rate, 4 values [0.0045, 0.001, 0.04, 0.01] are tested through experiments. The selected calculators include SGD calculator and Adagrad calculator. The structure of MLP hidden layer adopts tower structure and nontower structure. For example, when MLP uses a nontower structure and the number of hidden layers is 4, the corresponding MLP hidden layer structure is [50, 50, 50]. MLP uses a tower structure. When the number of nondisplay layers is 4, the corresponding MLP hidden layer structure is [320, 160, 80, 40] [24].

3.2. Evaluation Index

First, based on the timestamp information in the dataset, the interaction records of each user are sorted in order of time. For the MovieLens [25] dataset, each user selects the two pieces of data closest to the current time as the test set, the two pieces of data closest to the time as the verification set, and the remaining interaction records of each user are all used as the training set. For the Yelp dataset [26], each user selects the last data as the test set, the second data as the verification set, and the remaining interaction records of each user as the training set. The function of verifying the collection data is mainly to adjust the hyperparameters of the model and to evaluate the final performance effect of the model by testing the collection. The smaller the RMSE value is, the more accurate the prediction score of the model is [27].

3.3. Experimental Process
3.3.1. Model Training Process

In order to train the music classifiers based on the convolutional neural network, this article chooses 5000 pieces of music downloaded from websites with known tags. Since audio files contain a large amount of data, the cold start work of the music copy program is essentially to extract the main functions and condense them, trying to exclude all “noise” from the music data.

The features of the image are extracted by inputting the image in the convolutional layer, and the extracted features are further compressed by multilayer convolution, and finally the maximum function is input through the fully connected layer to divide the music types into 9 types, and the softmax function outputs the probability value for each music type and selects the most likely type as the music type [28].

3.3.2. Model Testing Process

After the model training is completed, 20% of the 5000 pieces of music in the training set are selected as the test set of the model, and the labels of the 1000 pieces of music are compared with the music types output by the multilayered classification neural network. If the two are the same, it proves that the model classification is correct. On the contrary, there are errors in the pattern classification [29].

4. Algorithm Experiment Analysis

4.1. Stability Analysis

In order to analyze the stability of the prediction algorithm, this paper uses CCS algorithm, central graph algorithm, and Las algorithm to predict the results of different nodes in the same sampling space. In each group of experiments, 1000 random samples were taken from the same sampling space, and 1000 prediction results were obtained. The confidence interval (CI) was used to analyze the 1000 results. In the confidence interval processing, this paper first sorts the performance parameters of 1000 prediction network results and then removes the first 2.5% and the last 2.5% to get the distribution curve of performance parameters. Finally, this paper compares the confidence distribution of performance parameters of prediction network results with the performance parameters of real network results and analyzes the stability of the algorithm under the condition of random sampling [30]. The density CI distribution curves of CCS algorithm, central graph algorithm, and Las algorithm under the same sampling rate and different sampling nodes are shown in Figure 1. From the graph, the network density distribution and clustering coefficient distribution of 1000 measurement results under different sampling rates are within the closed curve range. When the sampling space is small, the network density confidence interval distribution and clustering coefficient confidence interval distribution of 1000 random sampling prediction results of CCS algorithm are smaller than those of Las algorithm and central graph algorithm, and they are close to the performance parameters of real network. With the increase of sampling rate, the distribution of network density confidence interval and clustering coefficient confidence interval distribution of the three prediction algorithms show convergence and gradually decrease. However, compared with the other two algorithms, the network density confidence interval distribution and clustering coefficient confidence interval distribution of CCS algorithm are closer to the real network performance parameters. When the sampling rate is large, CCS prediction algorithm is better than Las algorithm and central graph algorithm, and the prediction result of CCS prediction algorithm is close to the real network density. Therefore, compared with the other two algorithms, CCS algorithm has less fluctuation and higher stability due to different sampling nodes. In a social network, if a node has multiple connections with other nodes in the network, the closer the node is to the core position of the social network, correspondingly, the greater the influence of this node on other nodes. If this node has little relationship with other nodes in the network and is relatively isolated, then this node is in the edge position in the social network. In the analysis of social network, the centrality of a node is the number of other nodes that establish a relationship with the node. It should be noted that the centrality of a node is divided into out degree and in degree in directed social network. The centrality mentioned here is more accurately called point centrality. The centrality of point degree can be further divided into absolute centrality and relative centrality. The absolute centrality of a node is to calculate the number of other nodes in the network that are directly related to the node. The higher the degree of node, the more important the node is in the network and can exert greater influence on other nodes. However, the centrality only calculates the number of nodes directly associated with the node and does not consider the nodes indirectly associated with the node. Therefore, the centrality here is only local centrality, which can only reflect that the node is an important node in a certain part, but not necessarily important in the whole network [31].

4.2. Analysis of the Influence of CNN Parameters

Through the comparison of CNN parameters, the final spatial CNN network structure is determined. After LBP and other features like RTU and BG3 are extracted, SVM is used as the classifier. The LBP is the local binary pattern as described in the paper [32], the BG3 is explained in [33], and RTU refers to remote terminal units found in [34]. The static texture recognition results are shown in Table 2 and Figure 2. We compared the accuracy rate (ACC), true rate (TPR), and true negative rate (TNR). The experimental results show that the use of convolutional neural network has a better recognition effect on the static texture of smoke. The accuracy of traditional feature algorithms is generally between 91% and 93%, while CNN has increased it to 99%, which is a significant improvement. The true rate and true negative rate have also been increased accordingly. Especially, the true rate of CNN has reached 99.76%, indicating that the recall rate of CNN is extremely high. Regardless of the method, the true negative rate is lower than the true rate; even the true negative rate of CNN is only 98.22%, which means that the algorithm has a certain degree of looseness in the exclusion of nonsmoke areas [35]. Each frame will be divided into 130 small blocks, and only using static texture detection will result in higher false detection of nonsmoke in the final video. Therefore, further dynamic texture detection is crucial. CNN is a structure that extracts image features layer by layer. The first layer of convolution layer extracts the bottom layer of the image, and the pixels are used to obtain lines in all directions. The second layer of convolution extracts the next level of features, and the contour lights are formed by lines. The last layer gets the representation of the highest layer. Therefore, the number of network layers in CNN is related to the size of the input image. Based on the LeNet network structure, this paper has determined a basic network structure including two convolution and pooling layers and two fully connected layers. Based on the basic network structure, we adjust the size of the convolution kernel, the number of convolution kernels, the number of neurons in the fully connected layer, the activation function, the pooling method, and other parameters to determine the final network structure of this article [36]. And the CNN is trained with the Adam optimizer with a learning rate of 1e − 4 and a batch size of 32 in a GPU machine.

The system time complexity analysis is shown in Figure 3. In terms of algorithm time complexity, we calculate the average time per frame of the three processes of video preprocessing, static texture analysis, and dynamic texture analysis. The video preprocessing part of the whole process takes 19 milliseconds per frame. When using GPU acceleration, the most expensive is the dynamic texture analysis process. The reason is that multiframe images are used as input, the network model is large, and the time complexity of the optical flow calculation process is high. The second is the video preprocessing process. The reason is that this process divides the entire image into small blocks and then performs motion detection and color detection. After this process, the small pieces of smoked area obtained have been reduced a lot, so the static texture analysis process takes less time. When using only CPU calculations, both static texture analysis and dynamic texture analysis cost more time. The static part is nearly 5 times faster than when GPU acceleration is used, and the dynamic part is about 1.5 times faster than when GPU acceleration is used. The efficiency of GPU acceleration in the dynamic part is not as high as in the static part, which is due to optimization problems in code implementation. First of all, the optical flow calculation part does not use GPU calculation, because this article is small in blocks; the use of GPU to calculate optical flow cannot achieve the expected effect, and this part causes a certain initial calculation cost in the dynamic analysis process; secondly, the calculation process of Caffe is Batch calculation, that is, calculating a fixed number of samples at a time, and the number of candidate smoke blocks dynamically analyzed in each frame of our experimental program is different. We use the filling method in the implementation, so it causes redundant calculations in some frames, and GPU computing has large data transmission time cost. In summary, when the smoke detection system constructed in this paper is accelerated by GPU, the average time of each frame is 70 milliseconds, which meets sub-real-time calculation. When GPU acceleration is not used, the average time per frame takes 140 milliseconds [37].

4.3. Comparison and Analysis of Accuracy, Recall, and F1 Metrics

The average recognition accuracy rate is shown in Table 3 and Figure 4. From Table 3, it can be seen that the recognition rate of the method proposed for the MNIST and basic datasets is not much different from that of the existing methods. But the marginalized deep autoencoder method for adaptive noise of the variant dataset has obvious advantages, which is better than the existing marginalization. In Figure 4, it shows the data in Table 3 in a more intuitive way. The recognition accuracy of deep autoencoders has increased by 2.7% on average. This method shows strong learning ability and strong antinoise performance. Compared with the existing noise reduction autoencoder model training time, mDAE and AmDAE have greatly reduced, reflecting the lower time complexity of the marginalization method. The training time of the improved AmDAE and mDAE models is not much different. In the standard MNIST variant dataset, as the number of training layers increases, the model training time basically increases approximately linearly with the number of layers, indicating that the time complexity of the deep neural network algorithm is positively correlated with the number of layers of the model. After preprocessing the experimental measurement data, the deep neural network model trained by the simulation data is used for image reconstruction. The reconstructed image effect obtained is much better than the CG method and the BP network method. The image not only has fewer artifacts but also can clearly reconstruct the shape and edge contour of the medium in the field. The reconstructed image obtained by the CG method has the most artifacts and the worst imaging effect at the center. At the same time, square and triangular plastic rods cannot be distinguished, and distortion occurs. When the BP network is trained with samples of one shape, the reconstructed image also contains a lot of artifacts. Although the reconstructed image is better than the CG method, it can reconstruct the shape of the plastic rod, but the edge contour of the object shape is not accurate, and even distortion appears in the reconstruction of square and triangle models [38]. The average mean square error of the reconstructed image obtained by the CG method is the largest, followed by the BP network method. After the experimental data is preprocessed and passed through the DNN method, the mean square error is the smallest, so the reconstructed image has a higher quality. By comparing the experimental results of the CRF model based on traditional machine learning, CRF combined with part-of-speech features and deep learning-based RNN model, LSTM-RNN model, and BLSTM-RNN model, it can be found that the results obtained by deep learning model training are in accuracy. The recall rate and F value are much higher than the results of traditional machine learning CRF model training. This shows that in the task of Weibo evaluation object extraction, the deep learning-based recurrent neural network model in this article is better than the traditional SVM model. By comparing the experimental results of the RNN model, the LSTM-RNN model, and the BLSTM-RNN model, it can be found that the experimental results of the BLSTM-RNN model are better than those of the other deep learning models in accurate evaluation and coverage evaluation, which shows the comparison. In other deep learning models, using the BLSTM-RNN model to deal with the evaluation object extraction problem has a better effect. Combined with the experimental results, the main reasons for using the BLSTM-RNN model to achieve good results are as follows. On the one hand, the use of the bidirectional recurrent neural network model can better preserve the historical information of the text and make full use of the structural advantages of the RNN network model to learn the feature training model. On the other hand, the word vector trained by Word2vec can describe the feature distribution of the original input data at a more abstract level, which is difficult for most manual feature extraction methods to achieve [39].

5. Conclusions

This paper mainly studies image recommendation algorithms based on deep learning to make up for the shortcomings of traditional text-based image recommendation methods. The main research content is the dimensionality reduction algorithm based on multiattribute image classification, target detection algorithm optimization, and convolutional neural network features. Based on this, an image recommendation prototype system is designed. This thesis provides two new ideas for the construction of individual-level knowledge networks, functional procedures, and overall system viewpoints by studying the construction of individual-level knowledge networks and by studying the influence factors of individual-level knowledge procedures in social networks.

With the rapid development of the Internet and social networking sites, more and more users upload photos to the Internet and put tags on the photos. At the same time, with the convenience of uploading photos, the amount of photos has become larger and larger, and the requirements for the sorting, management, and retrieval of photos have also become higher. Image recommendation is a very challenging research field. The image recommendation algorithm based on deep neural network reduces the time complexity of the algorithm and improves the recognition accuracy of the algorithm.

In the basic survey work of social networks, exploring important nodes in the network by evaluating the importance of nodes in the network has important practical value. By searching for important nodes, you can understand the hierarchical structure of a highly organized network, so you can have a deeper understanding of the organizational structure of the network. From the point of view of the importance of nodes, the algorithm proposes a new method for measuring the characteristics of highly organized network hierarchical structure, which evaluates the importance of nodes from a local and global perspective and uses the CNN method. The weights of local information and global information are optimized to improve the accuracy of comprehensive measurement and finally to mine the true layered structure of the network. The weights are optimized by using grid search algorithms which can automatically find the best weights through iteration.

Data Availability

All data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Shaohui Du and Zhenghan Chen have equal contribution.