Abstract

With the development of electronic information network technology, large car networking systems can produce all kinds of data such as text, images, and videos, a large number of heterogeneous data, different features of heterogeneous data, and different data structures. In the Internet of vehicles, the beacon message generation strategy needs to be researched and designed on the premise of meeting the requirements of vehicle location accuracy and wireless communication performance. According to the Kalman filter differential prediction equation, the message generation model of Kalman filter beacon is established. In the deep learning research on the underlying data fusion algorithm, the most effective way to solve the problem of insufficient integration degree between the underlying data is to improve the data quality and ensure data sharing and reuse between multisource heterogeneous data. Therefore, the D-S evidence theory fusion model and rough set underlying model are proposed in the vehicle-mounted cloud network. Among them, the D-S evidence theory fusion model ensures the improvement of underlying data quality, forms effective rule combination, and reduces conflicts between rules through filtering evidence theory. The rough set underlying data fusion model optimizes the underlying data of each device by improving the rough set attribute reduction method of particle swarm optimization algorithm.

1. Introduction

The Internet of vehicles (IOV) is a combination of vehicles, mobile Internet, and Internet of things. It refers to the network interconnection between vehicles and people, vehicles and roads, and vehicles and supporting infrastructure through vehicle-mounted devices or related mobile devices and the use of communication technology, intelligent terminals, and vehicle navigation systems. Thus, the network system of intelligent supervision, vehicle scheduling, and other related functions can be effectively implemented for the whole ecosystem of people, cars, roads, and the environment [1]. Compared with the traditional vehicle management system, the data scale, data types, and real-time data acquisition have made substantial progress. On this basis, more professional data processing technology and the application of more abundant vehicle management services for the heterogeneous underlying data fusion of the Internet of vehicles are still needed.

Although the vehicle-mounted cloud network is also a wireless network, it has different characteristics from other wireless networks because it covers the road network: (1) Node mobility: the fast and frequent movement of vehicles not only leads to the dynamic change of V2X wireless communication frequent short links (short link connection) and link capacity but also makes the dynamic change of network topology, which makes it impossible to form a stable topology. (2) The vehicle position can be predicted. The moving trajectory of a vehicle is limited, and it always drives along the given direction of the road. Its driving speed is affected by the moving state of the vehicle in front, and its position and moving direction and speed are predictable to a certain extent. (3) Local information acquisition: because the storage and computing functions of vehicle nodes are very limited, they cannot store a large amount of data or carry out complex application calculations. Therefore, in most cases, vehicle nodes only need to obtain local traffic information and realize complex applications by using the network. (4) Energy is basically unrestricted. The vehicle can carry and continuously supplement external energy to continuously power the on-board equipment, making the on-board equipment have strong performance. (5) GPS, vehicle sensor-assisted positioning: at present, many vehicles carry GPS and other on-board sensors for vehicle positioning, which can not only ensure that the vehicle has accurate global synchronization clock but also provide accurate position, speed, direction, and other state information for the vehicle. Through these devices, communication and interaction between vehicles can be well supported.

The big data scenario formed by the periodic dissemination of safety information by vehicle nodes in the Internet of vehicles is exactly in line with the extended application field of data fusion in the Internet of vehicles. The knowledge discovery of massive vehicle node information through the data fusion algorithm of universities can accurately locate the vehicle position information. Based on the vehicle random path prediction model, a moving vehicle location data update strategy based on BM-KFFPP and a beacon information generation strategy based on threshold were proposed on the premise of meeting the requirements of vehicle information accuracy and wireless communication performance. Aiming at the problem of vehicle position information loss in the process of beacon message transmission, a beacon lost data complement algorithm based on least square support vector machine was proposed, which was simulated and verified by example. The multisource heterogeneous underlying vehicle cloud network data fusion framework uses the ontology idea and D-S evidence theory method to perform feature fusion on the underlying data to reduce data redundancy, improve data accuracy, and optimize decision-making efficiency. Section 2 describes the related work, Section 3 describes the overall design of data fusion at the bottom of vehicle cloud network, Section 4 provides example verification, and Section 5 gives the conclusion.

2.1. Internet of Vehicles

The Internet of vehicles is an information interaction network composed of vehicle speed, route, and location. It is a vehicle-network joint technology that is aimed towards safety, energy saving, environmental protection, and information communication [2]. The Internet of Vehicles realizes the collection of vehicle, traffic environment and road information through electronic equipment such as RFID, GPS, Beidou positioning, high-definition cameras, sensors, and image processing. According to certain communication protocols and standards, wireless communication or information exchange can be carried out among one person, one road, one environment, one network, and one infrastructure. The cloud computing center uses computer technology to process vehicle data information, so as to timely report the road conditions, calculate the best route of different vehicles, arrange the signal cycle, and achieve intelligent scheduling, monitoring, and management of vehicles, people, and roads. The Internet of vehicles is the inevitable result of human society entering the information age and automobile age and is the extension and application of Internet of things technology in the field of transportation [3]. The formation of Internet of vehicles industry can be regarded as the collection of vehicle-borne information service system and intelligent traffic system (ITS) [4]. The vehicle information service system is to point to by vehicle electronic equipment in a timely manner to understand the status of the vehicle driving and the information service system. The intelligent transportation system mainly refers to providing the traffic information system, vehicle management system, etc., in an important mode of the future intelligent car [5]. The underlying data fusion is a multidisciplinary computer technology, which has related applications in many fields. The United States is the fastest developing and earliest in the underlying data fusion technology. In the early stage, the underlying data fusion was mainly applied to the military field. As early as the 1970s, the United States funded the research on sonar signal understanding and fusion and developed a series of CISR (Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance) and IW (Intelligence Weapon) systems [6, 7].

2.2. Underlying Data Fusion Technology

After years of research and development, multisensor underlying data fusion technology has achieved fruitful theoretical and application results. The progress of computer technology, communication technology, and data processing technology also provides new impetus for the development of multisensor data fusion technology. Among them, real-time processing technology plays an increasingly important role in the multisensor underlying data fusion system and is applied more widely. As the system structure becomes more complex and the data scale becomes larger, a real-time data processing strategy is needed to maintain the stable operation of the system and ensure the real-time utilization value of the collected data flow [8]. Due to the heterogeneity of data acquired by multiple sensors, distributed underlying data fusion will also bring into play its potential value. In order to meet the various needs of underlying data fusion, many underlying data fusion models have been proposed, which can be roughly divided into two categories: (1) functional fusion model, which is mainly constructed by the sequence of functions realized by the underlying data fusion in each node, and (2) data fusion model, which is mainly constructed through data extraction in the underlying data fusion [9]. Literature [10] is a fusion model of human-machine information interaction. This model takes into account the role of users in the terminal of the Internet of vehicles and mainly uses the role and reaction of people in the fusion process. The classic fusion method is modified to enhance the independence of different fusion stages, which is beneficial for better coordination of different fusion models. However, the weakness of this fusion model is that the user participates too much in the process, resulting in the lack of entity abstraction. The fusion model in literature [11] is applied to the Internet of vehicles in the Internet of things: the whole realization process is the network connection between vehicles and sensors of the surrounding environment infrastructure equipment, and the key to this fusion is the real-time correlation of data [12]. This method is divided into two different systems: data filtering and underlying data fusion, to ensure the accuracy of the data. The adaptive exponential smoothing method is adopted to rapidly fuse the road segment travel time based on a fixed detector and floating car under different reliability, so that the road segment travel time can be obtained accurately and efficiently [13]. Considering the continuity of traffic status (travel time) of adjacent sections, a fuzzy regression model is proposed, and only a small amount of floating car data are needed to accurately predict the travel time of interflow interruption [14].

2.3. Dynamic Road Network Induction

The heterogeneous system data: floating car GPS data, coil detection data, and video detection data, are fused and matched to the GIS to achieve dynamic road network induction [15]. By filtering out the floating vehicle data which are greatly affected by signal control and mining the historical floating vehicle data, a road travel time estimation method with missing signal timing information is proposed. For coarse-grained floating vehicle data [16], the average absolute error of this estimation method is superior to that of the traditional direct and indirect methods, and a new method for real-time capacity of urban road network of Internet of vehicles based on immune theory is proposed. This method introduces the immune network theory into the vehicle self-organization network and obtains the traffic flow in the road network through the recognition of statistical antigen and antibody, which not only solves the real-time problem but also solves the problem that traditional methods need to establish an accurate mathematical model [17]. In view of the characteristics of traffic information in the environment of Internet of vehicles, the improved artificial neural network and improved support vector regression method are used to predict the road travel time [18].

The research on the underlying data fusion in the aspect of vehicle-mounted cloud network mainly includes the following: Literature [19] puts forward the ontological research on ontology fusion methods among various specialties of high-speed railway in China. Literature [20] uses ontology to solve the problem of disunity between developers and operators of railway information in Europe and realizes data sharing and interoperability. Literature [21] introduces the overall design scheme and key technologies of railway big data platform. Literature [22] mainly studies and analyzes the standardization of high-speed railway data and proposes the construction of standard indicators. Literature [23] is aimed at the problem of multisource heterogeneity of data information in intelligent maintenance decision of the high-speed railway signal system, where the ontology fusion algorithm was used to reduce the computational complexity and time complexity compared with the classical closure algorithm, and the running time of the algorithm was also much lower than that of the classical closure algorithm. Finally, a unified framework for heterogeneous underlying data fusion and intelligent decision-making of the high-speed railway signal system is proposed. The experimental verification and analysis show that the diagnostic accuracy of this framework is significantly improved and its applicability is good. Literature [24] takes the high-speed railway as the research object and adopts feedback D-S evidence theory to optimize the algorithm for problems such as insufficient integration degree caused by multichannel data transmission during train operation, which is verified feasible by experiments. Through the research and analysis of the working state of the track circuit, the characteristic attributes of the data are extracted in literature [25], and the fusion of the two kinds of monitoring data of the signal equipment and the microcomputer equipment is realized, which is of practical significance for the research on the fusion of the underlying data of the track circuit. Literature [26] takes the TDCS/CTC system as the main research object and proposes a system hierarchical information aggregation scheme for the problem of high real-time requirements of massive train control data. The scheme was successfully run in the railway general dispatching command center, proving its practicability. Literature [27] uses the Kalman underlying data fusion algorithm to conduct real-time modeling of random vibration interference with the second-order autoregressive model and builds the strapdown inertial testing method for tracking geometric parameters. It is verified that the test degree of the modified system is effectively improved. Through the analysis and study of data transmission characteristics of railway train control in literature [28], the importance of fusion technology to improve safety performance of train control is clearly identified. In this regard, the colored Petri net (CPN) algorithm is used for modeling, and the experimental results show that this method is effective and feasible. Studies have shown that the different semantic alignment of vehicle-mounted data records in cloud network systems can easily lead to the problem of data conflict and information islands. The multisource heterogeneous data fusion method can integrate structured, semi-structured, and unstructured data to ensure data consistency and provide data guarantee for intelligent decision-making in railway operation and maintenance systems.

3. Overall  Design of Data Fusion at the Bottom of Vehicle Cloud Network

3.1. Architecture of Underlying Data Fusion

The data fusion system for heterogeneous data in the Internet of vehicles adopts the architecture of C/S (client and server) and B/S (browser and server) for mixed development. The client uses mobile devices with the Android operating system to obtain corresponding data. The heterogeneous data are then uploaded to the Tomcat server through wireless LAN 802.11 (Wi-Fi) or GPRS traffic data, and the Tomcat server cleans the heterogeneous data and transfers them to the corresponding cloud storage platform in real time. The server side mainly refers to the cloud storage platform where data are stored, while the data fusion display side extracts data from the cloud storage platform through related services to achieve related fusion. Figure 1 shows the overall architecture of the system.

The data acquisition layer is composed of various terminal devices installed on the vehicle, mainly using mobile devices with the Android operating system, such as mobile phones, tablets, vehicle intelligent rearview mirrors, and other devices. By using different sensors such as GPS and camera of mobile devices, text, picture, and video data are collected according to the different running time of vehicles. At the same time, the collected data are uploaded to the Tomcat server in real time through 3G/4G or Wi-Fi, and then the data are saved to the corresponding cloud storage platform in real time after relevant service processing.

3.2. Underlying Data Storage and Fusion

The data storage layer uses the cloud storage mode and manages data of the entire system. The Hadoop distributed file system (HDFS) stores uploaded video data in the data acquisition layer, the distributed nonrelational database MongoDB stores image data, and the relational database MySQL stores text data. The classified storage method is adopted to improve the storage efficiency and later utilization efficiency of data. The data interaction between the data storage layer and the data acquisition layer is shown in Figure 2.

The data fusion display layer is mainly used to fuse the three heterogeneous data—text, picture, and video—stored in the data storage layer and display them to the web front end, so as to satisfy people’s data utilization. This layer mainly develops corresponding web-side applications through the node.js platform and Java Web to realize the real-time vehicle position display, path tracking, and real-time upload data fusion and display on the browser side.

Packet routing data transmission is one of the key technologies for information transmission in the Internet of vehicles, and it is also the most common data transmission technology. It has important theoretical significance and practical application value and has a profound impact on the communication networking capability of the Internet of vehicles. In the environment of high mobility of vehicle nodes, topology changes greatly and links are unreliable. Networking routing is a complex task. The key point of routing design is to design and maintain routes from the source node to the destination node to ensure real-time, complete, and effective data transmission. The main challenges of routing protocol design include transmission delay from the source node to the destination node, routing cost and stability, and reliability and QoS. Performance indicators include average end-to-end latency, route cost, packet delivery rate, and throughput. Figure 3 shows the routing protocol diagram of packet transmission.

Routing protocols based on the topology structure use existing links to transmit data, which is inefficient, and the existing protocols have poor data transmission effect in high dynamic environments. For typical topology-based protocols such as DSDV (destination-sequenced distance-vector) routing, AODV (ad hoc on-demand distance vector) routing, and DSR (dynamic super-resolution) routing, NS2 is used to simulate urban scenarios. The simulation environment includes a 4 km section, 80 vehicle nodes, 90 min time, 6 MHz transmission rate, and 28.8 dbm power. The speed is 10 m/SEC, and the measurement analysis is simulated from the parameters such as throughput, packet loss rate, and data collision. Compared with the other two routing protocols AODV and DSDV, the DSR routing protocol has higher performance in traffic safety transmission in urban environments. Table 1 compares the transmission performance of typical routing protocols based on topology.

3.3. Bottom-Level Multiple Information Fusion of Internet of Vehicles Information Processing

More research on information fusion focuses on multisensor information fusion and comprehensive processing, so as to get more accurate and reliable conclusions. The multisensor data fusion is generally divided into three levels: data layer, feature layer, and decision layer.

Data layer fusion requires that sensors observe the same physical quantity. If multiple sensors observe different physical quantities, data can only be fused at the feature layer or decision layer. Data layer fusion is the direct fusion of the observation data of the observation sensor. The original measurement data of each sensor are directly fused without analysis and processing. This fusion has the advantage of collecting as much field data as possible, providing raw concrete information for other fusion levels. However, it requires a large number of sensors, high processing cost, long processing time, and poor real-time performance.

Feature layer fusion is the fusion of the information collected by the underlying sensor. Firstly, the original sensor data are collected, and then the features of the original data are analyzed and extracted. Finally, the data are classified and collected according to the characteristic information.

Decision level fusion is a kind of high level fusion, that is, the optimal decision is made according to certain judgment criteria. In order to meet the needs of specific decision problems, this paper makes full use of all kinds of feature information of measurement objects extracted from feature level fusion and adopts appropriate fusion technology to realize it.

Assume that the probability model of the measured value of sensor J is described by the Gauss probability distribution function:

As shown in Figure 4, there are three relationships between sensor I and sensor J: (1) sensor I and sensor J are independent; (2) sensor I strongly supports sensor J, and sensor J weakly supports sensor J; and (3) sensor J strongly supports sensor I, and sensor I weakly supports sensor I. For two sensors, if the measurement information of the two sensors is inconsistent, there must be sensor measurement error, and the fusion of the two sensors has no practical significance. When the measured values of the two sensors support each other, the fusion results minimize the uncertainty and unreliability of the measured values.

The rules in fuzzy theory select the minimum or maximum value directly from the dataset u, which is not suitable for the decision value of the fusion sensor k. Therefore, the following algorithm is improved:

The channel load data sequence is used as the dependent variable reference data column, and the communication environment parameters and traffic flow data sequence are used as the independent variable comparison data column. According to the following calculation steps, the correlation degree of the main influencing factors of channel load in this section is determined:Step 1(dimensionless processing of data): before the association analysis, in view of the different dimensions of the original data, it is necessary to carry out the dimensionless processing of range standardization on the original data and transform the data of different time and space into comparable standardized data. For the processing of original data, the standardized transformation method is adopted.Step 2generate the difference data sequence. The difference in the standard data series after dimensionless processing is calculated, that is, the absolute value of the difference between the standardized dependent variable series and the independent variable series.Step 3calculate the correlation coefficient according to Deng’s correlation degree calculation method:Step 4rank the calculated correlation coefficients according to their sizes. The larger the value is, the stronger the correlation degree between the influencing factor and the channel load is at time T; otherwise, the correlation degree is weaker.

In order to maximize transmission, the delay forwarding protocol is distance-based forwarding in essence. The setting of forwarding rules tries to make each distributed message cover the “jump” range as long as possible, that is, the node furthest from the source node is selected for forwarding, and the single hop is transmitted to the subsequent node for the next hop relay. Sending the end node sends messages, and the nodes within its transmission range receive messages. When receiving messages, the receiver starts the setting of delay forwarding timer T, and its model is as follows:

In message distribution, it is assumed that the vehicle node VS sends a new message, and all vehicle nodes that receive the message directly within the transmission range consider themselves potential relay and forwarding nodes (RNs). Nodes elected as RNs by policy will distribute messages to other vehicles in the transmission range; if not elected, the message is discarded. If two RNs are close to each other, because the transmission coverage area between them is basically the same, the situation of conflicting interference may occur. In addition, the two elected RNSs (radio network subsystems) also need to be in the communication range to ensure successful reception of relay messages.

The appropriate transmission range depends on the transmission rate of the channel and the target bit error rate. Therefore, the protocol is usually designed to maximize the transmission range of a relay configuration that uses as few destination RNs as possible, as long as transmission is ensured.

Beaconless message distribution usually uses the following basic methods to suppress redundant spurious distribution:(1)Message distribution based on probability: the vehicle node receiving the new message selects the node for message distribution based on probability P, which can be set to a fixed value or dynamically changed. According to the dynamic traffic density scenario, the probability P can be dynamically adjusted according to the number of receiving groups or the interest degree of neighboring vehicles in the message, which has a wider coverage and higher efficiency than the algorithm with fixed probability.(2)Counter-based message distribution: the vehicle node receiving the new message sets the counter and waiting time. When the vehicle receives the same message within the waiting time, the counter will be accumulated, and at the end of the waiting time period, the counter value will be compared with the set value (which can be fixed or dynamically adjusted according to some rules). If the counter value is less than the threshold, the message is forwarded. Otherwise, the message is discarded.(3)Message distribution based on delay forwarding timer: the vehicle node receiving new messages sets timer T  to delay message distribution. When the value of T expires, the message is forwarded. However, if the same data are received during the delay forwarding timer period, the node is forbidden to forward messages, which is called the suppression rule. The message distribution rule based on delay timer T is as follows: the vehicle with the lowest timer value will distribute the message first, and the other adjacent vehicle nodes whose delay timer does not reach will cancel the message distribution. The setting of delay timer is usually related to distance. The node with a larger distance from the sending vehicle has a smaller T  value of delay timer and can obtain the priority of forwarding.

4. Example Verification

The floating vehicle collects the number of vehicles, traffic flow density, speed, and other data within the vicinity of 500 meters once every 5 minutes, with a data scale of 300. The last 24 data are taken as test samples, and the channel load sequence is taken as the research object to conduct short-term channel load prediction experiments. According to the requirements of periodic beacon message generation rate and packet size in vehicle-mounted security applications, the size of beacon message is 800 bytes and the message generation rate is 15 bits/second, that is, the beacon message rate of each vehicle is 96 K. The initial communication distance is 250 meters, the initial carrier detection distance is 500 meters, the maximum communication distance is 500 meters, the maximum carrier detection distance is 1500 meters, the minimum allowable channel load threshold is 3 Mbps, the maximum allowable channel load threshold is 6 Mbps, the power adjustment step is 0.01, the channel load prediction period is 1 minute, the average velocity is 58 km/h, and the green interval is 3.8 s. The simulation parameter settings are shown in Table 2.

After using the DS algorithm, the average channel load generated by the beacon message near the intersection is shown in Figure 5.

As shown in Figure 5, after a certain oscillation period, the channel load converges to the preset threshold range of 3–6 Mbps. The closer to the intersection, the greater the channel load value. When vehicles leave the intersection, the channel load value begins to decrease significantly. At 400 m away from the intersection, the channel load begins to increase again.

Figure 6 sets the average transmission delay and MAC layer redundant stray distribution probability of the actual experiment under different vehicle densities P and different vehicle speeds on the experimental road with 3D graphics and establishes a model to compare the simulation results. The linear surface in the figure represents the simulation results of the real road, and the model analysis results are represented by dashed lines. The average MAC layer transmission delay time of (80, 100, and 120 km/h) is plotted.

Figure 7 shows the relationship between the message distribution rate and the transmission range at different vehicle densities. As can be seen from Figure 7, for values below 400 m, the probability of repeated forwarding is very low, the message distribution efficiency is high, and the influence of vehicle density can be ignored.

As for the relationship between the PDR and the communication distance between two vehicles, the expected PDR is set as 9, the minimum retransmission distance is 50 meters, the maximum retransmission times is 5, the retransmission interval is 50 meters, and the maximum retransmission distance is 500 meters. The American Denso V2X vehicle-mounted machine supporting DSRC protocol was selected as the workshop communication platform, and the underlying data fusion decision algorithm was set according to the above parameters to conduct data acquisition and analysis on the position message generated by the No. 7 and 8 vehicles in the fleet and the communication process with the neighboring vehicles. PDR value pairs without transfer mode algorithm and with the transfer mode algorithm are shown in Figures 8 and 9.

5. Conclusion

For real-time and heterogeneous data acquired in the data awareness layer, relevant databases or distributed file systems of different cloud storage models are adopted for different data formats. The classification and storage of heterogeneous data facilitates related operations in data fusion. A threshold-based message generation strategy for vehicle beacons is proposed. According to the relation of Kalman filter prediction time domain, the message generation model of Kalman filter prediction beacon is established. According to the measured value and preset threshold of channel load, a beacon message generation model and strategy with time interval adaptive adjustment are established. This distributed strategy can effectively reduce channel load, avoid channel congestion, and ensure the fairness of message generation and transmission of each node while meeting the requirements of location information accuracy required by the application of Internet of vehicles. In the case of the large-scale road network, it is necessary to reduce the burden of path calculation in the Internet of vehicles data center and optimize and improve the efficiency of path calculation. In addition, the method in this paper does not consider the special needs of customers, such as the shortest route, the lowest driving cost, the safest travel, and other personalized needs. The above problems are the key points to be improved and perfected in the next step.

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Informed consent was obtained from all individual participants included in the study references.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was supported by the 2019 Cross Science Research Project of Nanyang Institute of Technology (Grant no. 201913502) (Research on Intelligent Mining and Recommendation of Zhang Zhongjing Prescription Based on Deep Neural Network) and Henan Science and Technology Plan Project (Grant no. 222102210134) (Research on Key Technologies of Cloud Security Desktop Based on Kunpeng Architecture).