Abstract

The diversity of big data in Internet of Things is one of the important characteristics that distinguish it from traditional big data. Big data of Internet of Things is often composed of a variety of data with different structural forms. The description of the same thing by these different modal data has certain independence and strong relevance. Accurately and efficiently extracting and processing the hidden fusion information in the big data of the Internet of Things is helpful to solve various multimodal data analysis tasks at present. In this paper, a multimodal interactive function fusion model based on attention mechanism is proposed, which provides more efficient and accurate information for emotion classification tasks. Firstly, a sparse noise reduction self-encoder is used to extract text features, Secondly, features are extracted by encoder. Finally, an interactive fusion module is constructed, which makes text features and image features learn their internal information then the combination function is applied to the emotion classification task.

1. Introduction

A protocol for sharing system information in wireless sensor networks, an energy-aware multipoint relay protocol based on hybrid MAC, is proposed. Simulation results show that compared with on-demand DMDS, it can reduce the corresponding power consumption. Compared with the traditional system data sharing protocol, this protocol has short transmission delay [1]. Surveillance and spatial perception systems that need to capture panoramic images new challenges are posed in data fusion. Image fusion of multisensor array can be realized by data acquisition and computer operation. This paper discusses the design considerations, delay, control loop for source data aggregation, health monitoring, and data rate processing of such systems [2]. The mission of advanced volume sensor is to develop a low-cost detection system that identifies ship damage management status and provides real-time threat level information for damage management events. NRL built two prototype systems with multicockpit sensors. The test results show that the prototype of volume sensor is equivalent to or better than commercial video detection system and point detection system in key quality indicators of fire detection. [3]. We investigate and demonstrate dedicated surveillance services based on UAV fleet. Its purpose is to provide support for threat detection through data fusion, enhance the situational awareness of operators, and reduce the workload of operators. Therefore, a distributed monitoring system is proposed to enhance the detection capability, high-level data fusion capability, and autonomous capability of UAV [4]. In this paper, a new distraction detection method is proposed. Naturalistic driving data and video surveillance records from the Shanghai Naturalistic Driving Study were used for analysis. The application program deals with complex interference by weighing focus characteristics, which provides method basis and technical support for driver behavior early warning system [5]. Domain adaptive learning is a special transfer learning method. The source domain and the target domain usually have different data distributions, but they must perform the same tasks. We have verified the proposed method for domain adaptive classification tasks on PointDA-10 dataset. Empirically, it shows strong performance comparable to the standard, even better than the most advanced performance [6]. Network security situational awareness is an effective way to analyze complex network security situation. This paper puts forward the concept and model of network security situational awareness and puts forward a new network security situational awareness model. We focus on the attributes of multisource data in network security research and introduce situational awareness algorithm based on data fusion. The results may reflect the overall security status of the network [7]. For companies that build situational awareness based on multisource data, image development is becoming more and more important. This paper introduces an image development application that uses only image content to detect objects of interest, and automatically creates and saves spatial and temporal relationships among images, cameras, and objects. [8]. Data fusion technology is widely used in automatic object recognition system. The problems of data aggregation systems are intrinsically complex, not only accidental but also vague. A method to determine the basic probability reported by each sensor is proposed. Fusion of sensor reports using classic Dempster combination rules [9]. A new method of mobile station position tracking under LOS/NLOS mixed conditions is proposed. The algorithm has good flexibility. It can support various measurement methods and asynchronous or synchronous observation data, which is especially suitable for future interoperable positioning systems [10]. A new method for early detection of breast cancer using ultrawideband microwave imaging technology is proposed. Then, based on the fusion concept of TOA data, the position and scattering intensity of main scatterers are estimated. Compared with existing methods, this method has higher computational efficiency because the scanning process is localized in several candidate positions [11]. In order to determine the reliability of each sensor and how to fuse the measured data of sensors, a data fusion method based on fuzzy theory is proposed, and its measurement application is studied. The measurement example shows that this method is feasible, can give priority to stable and reliable sensors, has good measurement effect, is simple and effective, and is convenient for real-time measurement [12]. We make use of comparative advantages to overcome the limitations of different technologies and data sources, it is necessary to develop basic methods, which promotes the development of data fusion methods to improve material position estimation in this paper. The results show that method has the ability to improve position estimation and can resist measurement noise and future technical development [13]. This paper introduces a method of synthesizing high-resolution and low-resolution data for the scanning system. According to the complexity of parts, the modeling process can be partially or fully automated. This method of combining data improves the flexibility of scanning and maintains the accuracy of results [14]. In data fusion, linear combination method is a very flexible method, because it can give different weights to different systems. We find that some medium power functions are more effective in combining data than in simple weighting mode, and data combination can be realized as efficiently as in simple weighting mode [15].

2. Data Perception Method

2.1. Artificial Data Perception Method

The programmer uses a common programming language or a specially designed scripting language to write a personalized data-aware wrapper according to the specific structure of each web page. Because the wrapper’s data awareness rules are identical to the page structure, the quality and efficiency of data awareness are usually higher. The shortcoming of this method is that once the page changes, the wrapper loses its data perception ability and needs manual modification, and the maintenance cost is relatively high, which is not suitable for large-scale commercial use.

2.2. Semiautomatic Data Perception Method

Because of the high learning cost and maintenance cost of manually constructed data-aware wrapper, semiautomatic data-aware wrapper came into being. This method requires certain manual operation and assists the generation of wrapper through data annotation. Usually, these annotation operations are relatively simple, and annotators can complete them without mastering programming knowledge. The commonly used semiautomatic data-aware wrappers are divided into two categories: one is the wrapper constructed by inductive derivation, including pattern rule method and template tree matching method. The other is a wrapper constructed by machine learning method, which trains statistical models from the feature data of web pages to realize data perception and analysis.

2.3. Automatic Data Perception Method

The methods that can generate data perception rules without user participation and manual labeling of training samples are collectively called automatic data perception methods. Commonly used automatic data-aware wrappers are divided into three categories: (1) ontology-based data-aware wrappers, (2) visual-based data-aware wrappers, and (3) repetitive similar subtree-based data-aware wrappers, all of which can adaptively adjust data-aware rules to adapt to the changes of web page structure.

3. Feature Fusion Model

3.1. Summary of AE-IFF Model

The purpose of this model is to correctly predict the emotional polarization of texts. The bottom layer of the model is a text attribute separation module and image attribute separation module. The main work is to transform the text from to , where represents the word vector dimension, and converts the image into a fixed-size vector. The next layer of the model is a feature synthesis module, which includes a fine-grained attention mechanism, which learns to share hidden representations of text and images interactively and allows modal synthesis. Above the model is a fully integrated layer, which combines two different primary and secondary functions of the module as the input of the classifier to perform the emotion classification task.

3.2. Text Function Decoding Module

The function of the text element extraction layer is to map each word to a small dimensional vector called an embedded word. There is a lot of noise in the text data of social media, which affects the accuracy of feature extraction. In order to eliminate interference and obtain more efficient functions, this chapter uses sparse noise reduction automata to extract text functions.

3.2.1. Construction of a Rare Automatic Encoder

In the previous article, the concept, basic principle, and formal representation of automatic encoder have been briefly described. When the hidden layer unit is smaller than the input unit, the automatic encoder will have a better effect in feature extraction. In contrast, if the number of units is just the opposite of the above, an additional sparsity constraint needs to be added to the loss function of an automatic encoder, which is called a sparse automatic encoder.

In this section, some improvements are made to the common automatic encoder by adding sparsity constraints. Inspired by the fact that the number of activated neurons in the brain is very small, we make the activation in neural network in sparse form.

Assuming that the input is and the activation of the th hidden unit is , the average activation function is shown in

After the average activation is defined, the sparsity constraint can be given as shown in

In the above formula, is usually called sparse parameter. In order to achieve the above biological characteristics, is usually equal to a very small value. This paper makes an empirical study on the selection of sparse parameters. According to the survey, in many published literatures, researchers mostly choose the number far less than 1 as the sparse parameter. In order to control the weight of the function, a superparameter a is introduced, and the final function is shown in

In addition, is called KL divergence, which is used to measure the different degrees between . is used to indicate the loss function of sparse automatic encoder. Similar to the selection method of sparse constraints, can be set to 3 or 5. In fact, the setting method of these parameters has a good effect in practical application. Obviously, cannot be 0, because this will eliminate the effect.

3.2.2. Construction of Noise Reduction Self-Encoder

Noise reduction self-encoders are usually used to learn more robust features. In addition to the normal encoding and decoding stages, it also includes the destruction operation before encoding. The destruction operation is mainly to destroy the input matrix of eigenvectors in the original data, which is similar to adding some artificial noise. The destruction of the original data by the noise reduction self-encoder can be expressed by

In the above formula, generates a matrix with the same dimension as . After obtaining the data, encoding and decoding can be performed using corrupted data, as shown in

In contrast, the decoding process remains unchanged. It should be noted that when calculating the actual loss value , the difference between the initial data and the original data is still compared. There are three common damage operations for noise reduction automatic encoders: (1)Gaussian noise: (2)Masking noise: randomly selecting some pixels in sample and setting the value to 0(3)Salt and pepper noise: a single pixel is randomly selected based on a binomial distribution and the value is set to a predefined maximum or minimum value

It should be noted that the automatic encoder for noise reduction is not used for noise reduction in practical application, but for learning more effective and robust features. As for the specific meaning of effectiveness, in this section, it refers to using the features learned by the automatic noise reduction encoder to obtain higher classification accuracy. The weight matrix can be initialized with or , which can avoid the nondifferentiable weights.

3.2.3. Training Sparse Noise Reduction Self-Encoder

Generally speaking, the difference between noise attenuation self-encoder and sparse noise attenuation self-encoder is that the latter introduces scarcity constraint to the loss function. The training method of sparse noise reduction self-encoder mainly includes the following five steps: (1)The original dataset is damaged, and the result is obtained(2)Forward propagation with as input, and then, output is obtained(3)Calculate the loss value(4)Optimize bias using back propagation(5)Repeat 2-5 until the function converges

In general, histograms allow us to measure the similarity between original text data and reconstructed text data from a statistical point of view. However, due to the low dimension and strict preprocessing of commonly used social media texts, the evaluation results obtained by histogram are not clear. Therefore, this paper uses correlation coefficient to evaluate the compatibility of text data before and after reconstruction, and the calculation method of correlation coefficient is shown in

In the above formula, represents the original data, represents the reconstructed data, and and are the mean values of and , respectively.

By using sparse noise reduction self-encoder, the model can extract the most representative key features from the text dataset, reduce the interference of noise, improve the robustness of text features, and effectively overcome the overfitting problem.

3.3. On Feature Extraction of Image

We have already introduced the basic principle and structural characteristics of variational self-encoder. The purpose of this section is to extract image features containing emotional information from image data.

Variational self-encoder is a variational extended model of automatic encoder, which is similar to classical automatic encoder. After the decoder reconstructs the encoded result, the reconstructed model is obtained. The model remaps its own hidden features to the probability distribution of the decoded result. The smaller the error between the new probability distribution and the old original distribution, the better the effect of the model.

The network of variable self-encoders has two components, and its loss function can be expressed by

Loss functions usually consist of two subterms.

Data itself has certain independence, and some vectors in the original features can well represent their independent characteristics. In this section, the variant of variational self-encoder is used for experiments. introduces unwrapping prior into variational self-encoder; it can prove the independence of data by using different variables. This expansion prior enables the encoder to learn a concise representation of data, which is beneficial to the subsequent emotion analysis tasks in this section and improves the overall performance of the model. The loss function of can be expressed by

As can be seen from the above equation, introduces an adjustable hyperparameter , which can keep a balance between potential variable dimensional changes and the accuracy of reconstructed data. Gauss’s transcendental homogeneity also sets invisible constraints behind the learning process. The teacher of the model is associated with the value of , so the value of must be constantly adjusted in the experiment, so that the model can use extended representation to learn different levels of functions.

The function of attention mechanism in neural network mentioned in this paper is very similar to the relationship between human visual attention mechanism and brain. The function of attention mechanism is to screen some key information from complex information, and then downstream tasks use these key information to achieve the expected goals. By analyzing the global features of input data, the attention module learns the intrinsic meaning of feature expression and selectively extracts features containing key information.

In this paper, we construct new input data according to the feature generated by the pretrained encoder and input it into the attention module, . As shown in equations (10)–(13), the original feature fuses the multisource information in the convolution kernel through convolution calculation and calculates new feature sequences and , the size of the convolution kernel is 1, the dimension of the new feature sequence is , and . Then, we multiply the transpose matrix of and the transpose matrix of to obtain the product of them. Finally, we standardize the results to obtain the final attention probability distribution , and the dimension of is . By comparing the attention weights of in each channel, this module enhances the effect of key attribute set and reduces the influence of redundant attributes on final attribute extraction, as shown in

Finally, the weighted sum of the weighted coefficient and the input feature vector of the original feature is carried out, and the summed result is fine-tuned with the scale coefficient , so that the feature expression containing key information can be obtained as shown in

In the above formula, is initialized to 0, which will gain more and more weights in the process of learning features of the model.

3.4. Interactive Fusion Module

For feature extraction, this paper designs an interactive feature fusion module to fuse the two. The attributes of the first two modules are used as the input of the attribute fusion module, and the two input modes are combined to output from the target mode. Given a primary input and an auxiliary input , the primary and secondary inputs are projected into the space vector at the same time, as shown in

In the above formula, represents the dimension of space vector. In this section, the function of and is to calculate matrix , and matrix can be expressed by

To measure the importance of secondary inputs, is quantified using the function as shown in

Then, you get an attention-based auxiliary input , as shown in

Finally, and are inputted to the full connection layer and then the fusion feature is obtained, as shown in where and are inputted to the full connection layer.

Through the modules mentioned before, the fused features are obtained. Some of which are text-based and image-supplemented features, and the other is image-based and text-supplemented features. We connect them, input the last full connection layer, use function to process, and complete the task of emotion classification.

4. Analyze the Experimental Results

4.1. Environment for the Experiment

The algorithm in this chapter is experimented with MatlabR2010b on a computer with 2.53 GHz CPU and 4 GB memory capacity. In the experiment, the chess dataset in UCI database is used as the experimental sample to simulate the image information obtained by sensors in real scene. The basic information is shown in Table 1. The data is classified into two categories: winning and nonwinning. Features have been extracted from the dataset, and each group is a 36-dimensional feature array. In the data sampling stage, 2000 samples are extracted as classifier training sets; we extract 400 samples as integrated training set; 500 samples are selected as the test set.

Taking 300 samples as training set; 500 samples are selected as the test set. Because the proposed model deals with the original data, the experimental dataset also uses the real dataset chain. The data are divided into two categories: chairs and nonchairs, and the picture resolution is 800800. In the feature extraction stage, ANN is used to extract features independently, and each sample forms a group of 64-dimensional feature arrays. In the data sampling stage, 600 samples are extracted as classifier training sets; we extract 300 samples as integrated training set; 100 samples are selected as the integration test set.

4.2. Experimental Design

(i)Experiment 1: comparison of accuracy between selected base classifier and unselected ensemble classifier

Because there are many machine learning algorithms and the corresponding generalizations are countless, it is obviously unrealistic to enumerate all strong classifiers and select the best classifier. Therefore, only for the commonly used machine learning classifiers and chess datasets, the classification combination scheme is obtained by using the base classifier evaluation method proposed in this paper, as shown in Table 2.

Because the effect of boosting ensemble classifier is obviously better than bagging, in the comparative experiment, boosting algorithm is used to train and generate a new base classifier, that is, randomly select a subset of samples as the training set repeatedly. The base classifier is CART tree. Using boosting algorithm, some samples are randomly selected from the training samples to form 10, 15, 20, and 25 sample subsets, and the 10, 15, 20, and 25 base classifiers generated by them are combined to calculate their recognition rates and use them to identify the integrated training samples. According to the recognition rates, the group with the highest recognition rate is selected. The base classifier evaluation algorithm is used to select the corresponding number of base classifier combinations with the largest difference from common classifiers and finally calculate their respective recognition rates. Both of them use voting method and linear combination method to complete classifier integration. Figures 1 and 2 show the recognition rates of classifier combinations of different sizes directly constructed by boosting method, and Figures 3 and 4 show the recognition rates of base classifier evaluation method.

As can be seen from Figures 14, under the chess dataset, the accuracy of the integrated classifier without the base classifier evaluation method reaches the highest 90.2% when the number of classifiers is 20 and 25. Under the condition of the same number of base classifiers, the recognition rate of the ensemble classifier using the base classifier evaluation method is up to 93.9% and 92.2%. Under the chair dataset, when the number of classifiers is 15, the recognition rate increases by 4.5% at the highest. Therefore, the evaluation method of base classifier can effectively improve the recognition rate of ensemble classifier. At the same time, it can be found that when constructing a multiclassifier ensemble system, the more classifiers, the better. (i)Experiment 2: comparison between FSE integration method and common integration methods

In order to compare the difference between using FSE integration method and common integration methods, FSE integration method is added on the basis of Experiment 1. The specific results are shown in Figures 5 and 6.

As can be seen from Figures 5 and 6, under the chess dataset, the recognition rate of FSE integration method is nearly 2% higher than that of voting method and linear combination. The recognition rate is also about 1% higher under the chair dataset. Therefore, using FSE ensemble method can further improve the recognition accuracy of the final ensemble classifier.

4.3. Analysis of Experimental Results

The experimental results between the model proposed in this chapter and other comparison algorithms are shown in Tables 3 and 4. The trend chart is shown in Figures 7 and 8.

It can be seen from Table 4 that PMF is the worst performance among the comparison methods, and the method proposed in the article is the best performance. With the increase of data volume, the performance of different models is getting better and better. The method used in this paper shows the best performance.

The experimental results show that the MAE of the model in this chapter is 0.7398, 0.6681, and 0.6127 and RMSE is 0.8670, 0.8133, and 0.7516, respectively, on MovieLens-100K, MovieLens-1M, and MovieLens-10M datasets. Compared with other comparison methods, for example, the model of aSDAE, which performs well in the experiment, has MAE of 0.7487, 0.6911, and 0.6200 and RMSE of 0.8801, 0.8368, and 0.7865, respectively, on the three datasets. Moreover, the experimental results in Figures 7 and 8 can also intuitively show the improvement of the experimental performance of the design model in this section. This shows that this section can mine the feature combination relationship between multisource data well by introducing multisource data features and designing feature crossover mechanism, Therefore, it can be inferred that the model in this section can not only make full use of the interactive data between users and projects but also mine more meaningful information through automatic feature cross-mining, so as to achieve better recommendation.

This section introduces the model structure and composition of the recommended multisource neural cooperative filtering algorithm, which ensures the efficiency of the recommended model for MovieLens-100K, MovieLens-1M, and Movielens-public datasets. Compared with NMF, SVD++, PMF, and ASDAE, the proposed recommendation algorithm has higher accuracy.

5. Concluding

Now, although the algorithm of using multisource data for recommendation has made some progress, there are still many shortcomings that need to be solved and optimized. At the same time, there is a lot of room for improvement of the model, so we can design fusion strategies and recommendation strategies that meet different scenarios from different angles. Through the analysis and experimental verification of this paper, it can be concluded that the multisource data related to the recommendation task can be effectively fused, modeling user interest preferences more accurately, more fine-grained modeling of users’ interest preference, and in-depth understanding of users can provide users with more targeted and valuable suggestions and can provide better recommendation results for different users. Better user interest modeling and more effective features can improve the ability of recommendation system more effectively.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.